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Abstract 

In his 1947 paper that inaugurated the probabilistic method, Erdos [Erd47] proved 
the existence of 2 log n -Ramsey graphs on n vertices. Matching Erdos’ result with a 
constructive proof is a central problem in combinatorics, that has gained a significant 
attention in the literature. The state of the art result was obtained in the celebrated 
paper by Barak, Rao, Shaltiel and Wigderson [Ann. Math’12], who constructed a 

22 (log -Ramsey graph, for some small universal constant a > 0. 

In this work, we significantly improve the result of Barak et al. and construct 
2( loglogn ) c -Ramsey graphs, for some universal constant c. In the language of theoretical 
computer science, our work resolves the problem of explicitly constructing two-source 
dispersers for polylogarithmic entropy. 
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1 Introduction 


Ramsey theory is a branch of combinatorics that studies the unavoidable presence of local 
structure in globally-unstructured objects. In the paper that pioneered this held of study, 
Ramsey [Ram28] considered an instantiation of this phenomena in graph theory. 

Definition 1.1 (Ramsey graphs). A graph on n vertices is called k-Ramsey if it contains 
no clique or independent set of size k. 

Ramsey showed that there does not exist a graph on n vertices that is log (n)/2-Ramsey. 
In his influential paper that inaugurated the probabilistic method, Erdos [Erd47] comple¬ 
mented Ramsey’s result and showed that most graphs on n vertices are 2 log n- Ramsey. 
Unfortunately, Erdos’ argument is non-constructive and one does not obtain from Erdos’ 
proof an example of a graph that is 2 log n- Ramsey. In fact, Erdos offered a $100 dollar prize 
for matching his result, up to any multiplicative constant factor, by a constructive proof. 
That is, coming up with an explicit construction of an O (log n)-Ramsey graph. Erdos’ 
challenge gained a significant attention in the literature as summarized in Table 1. 

Explicitness got a new meaning in the computational era. While, classically, a succinct 
mathematical formula was considered to be an explicit description, complexity theory sug¬ 
gests a more relaxed, and arguably more natural interpretation of explicitness. An object is 
deemed explicit if one can efficiently construct that object from scratch. More specifically, a 
graph on n vertices is explicit if given the labels of any two vertices u,v, one can efficiently 
determine whether there is an edge connecting u, v in the graph. Since the description 
length of u,v is 2 log n bits, quantitatively, by efficient we require that the running-time is 
polylog(n). 

Ramsey graphs have an analogous definition for bipartite graphs. A bipartite graph on 
two sets of n vertices is a bipartite h-Ramsey if it has no k x k complete or empty bipartite 
subgraph. One can show that a bipartite Ramsey graph induces a Ramsey graph with 
comparable parameters. Thus, constructing bipartite Ramsey-graphs is at least as hard as 
constructing Ramsey graphs, and it is believed to be a strictly harder problem. Furthermore, 
Erdos’ argument holds as is for bipartite graphs. 

The main result of this paper is an explicit construction of bipartite Ramsey graphs that 
significantly improves previous results. 

Theorem 1.2 (Ramsey graphs). There exists an explicit bipartite 2( loglogn )° (i) -Ramsey graph 
on n vertices. 

In fact, the graph that we construct has a stronger property. Namely, for k = 2( loglogra )° (1) , 
any k by k bipartite subgraph has a relatively large subgraph of its own that has density 
close to 1/2. 

1.1 Two-source dispersers, extractors, and sub-extractors 

In the language of theoretical computer science, Theorem 1.2 yields a two-source disperser 
for polylogarithmic entropy. We first recall some basic definitions. 
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Construction 

k(n ) 

Bipartite 

[Erd47] (non-constructive) 

2 logn 

/ 

[Abb 72] 

n log 2 / log 5 


[Nag75] 

n 1 / 3 


[Fra77] 

n 0< T) 


[Chu81] 

20((logn) 3 / 4 -(log logn) 1 / 4 ) 


[FW81, Nao92, Alo98, GroOl, Bar06] 

20(Vlog n-log log n) 


The Hadamard matrix (folklore) 

n/2 

/ 

[PR04] 

n/2 — \fn 

/ 

[BKS+10] 

o(n ) 

/ 

[BRSW12] 

_ 0 ( i) 

/ 

This work 

2 (log logn) 0 ^ 1 ) 

/ 


Table 1: Summary of constructions of Ramsey graphs. 


Definition 1.3 (Statistical distance). The statistical distance between two distributions X, Y 
on a common domain D is defined by 

SD (X, Y ) = max {| Pr[X E A] - Pr[Y e A] |} . 

//SD(X, Y) < e we say that X is e-close to Y. 

Definition 1.4 (Min-entropy) . The min-entropy of a random variable X is defined by 

Hm (X) — min log 9 ( — r - - T V 

°° V ’ xesupp(X) B2 \Pr [X = x]J 

If X is supported on {0, l} n , we define the min-entropy rate of X by H oc (X)/n. In such 
case, if X has min-entropy k or more, we say that X is an (n,k)-weak-source or simply an 
(n, k)-source. 

Definition 1.5 (Two-source zero-error dispersers). A function Disp: {0, l} n x {0,1}" 

{0, l} m is called a two-source zero-error disperser for entropy k if for any two independent 
(n,k) -sources X,Y, it holds that 

supp(Disp(X, Y)) = {0, l} m 
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Note that a two-source zero-error disperser for entropy k, with a single output bit, is 
equivalent to a bipartite 2 A, '-Ramsey graph on 2 n vertices on each side. Constructing two- 
source dispersers for polylogarithmic entropy is considered a central problem in pseudoran¬ 
domness, that we resolve in this paper. Indeed, a 2 poly ( lo s lo § n )-Ramsey graph on n vertices is 
equivalent to a disperser for entropy polylog(n). From the point of view of dispersers, it is 
easier to see how challenging is Erdos’ goal of constructing 0(logn)-Ramsey graphs. Indeed, 
these are equivalent to dispersers for entropy log(n) + 0(1). Even a disperser for entropy 
O(logn) does not meet Erdos’ goal as it translates to a polylog(n)-Ramsey graph. 

While Theorem 1.2 already yields a two-source zero-error disperser for polylogarithmic 
entropy, we can say something stronger. 

Theorem 1.6 (Two-source zero-error dispersers). There exists an explicit two-source zero- 
error disperser for n-bit sources having entropy k = polylog(n), with m = k n X output bits. 

Theorem 1.6 gives an explicit zero-error disperser for polylogarithmic entropy, with many 
output bits. In fact, we prove a stronger statement than that. To present it, we recall the 
notion of a two-source extractor, introduced by Chor and Goldreich [CG88]. 

Definition 1.7 (Two-source extractors). A function Ext: {0, l} n x {0,l} n —> {0, l} m is 
called a two-source extractor for entropy k, with error e, if for any two independent (n, k)- 
sources X,Y, it holds that Ext(A, Y) is e-close to uniform. 

Chor and Goldreich [CG88] proved that there exist two-source extractors with error e 
for entropy k = log(n) + 21og(l/e) + 0(1) with m = 2k — 21og(l/e:) — 0(1) output bits. A 
central open problem in pseudorandomness is to match this existential proof with an explicit 
construction having comparable parameters. Unfortunately, even after almost 30 years, little 
progress has been made. 

Already in their paper, Chor and Goldreich gave an explicit construction of a two-source 
extractor for entropy 0.51n, which is very far from what is obtained by the existential ar¬ 
gument. Nevertheless, it took almost 20 years before any improvement was made. Bour- 
gain [Bou05] constructed a two-source extractor for entropies (1/2 — a) ■ n, where a > 0 
is some small universal constant. An incomparable result was obtained by Raz [Raz05], 
who required one source to have min-entropy 0.51n but the other source can have entropy 
O(logn). 

In this paper we construct a pseudorandom object that is stronger than a two-source 
zero-error disperser, yet is weaker than a two-source extractor. Informally speaking, this is 
a function with the following property. In any two independent weak-sources, there exist 
two independent weak-sources with comparable amount of entropy to the original sources, 
restricted to which, the function acts as a two-source extractor. To give the formal definition 
we first recall the definition of a subsource, introduced in [BKS + 10]. 

Definition 1.8 (Subsource). Given random variables X and X' on {0, l} n , we say that X' 
is a deficiency d subsource of X and write X' C X if there exists a set A C (0, l} n such that 
(A" | A) = X' and Pr[A e A] > 2~ d . More precisely, for every a G A, Pr[A' = a] is defined 
by Pr[A = a | X e A] and for a ^ A, Pr[A' = a] = 0. 
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It is instructive to think of a weak-source X as a random variable that is uniformly 
distributed over some set S(X). In this case, X' is a subsource of X is the same as saying 
that S(X') is a subset of S(X). The deficiency determines the density of S(X') in S(X). 

Definition 1.9 (Two-source sub-extractors). A function 

SubExt : {0, l} n x {0, l} n -)■ {0, l} m 

is called a two-source sub-extractor for outer-entropy k out and inner-entropy k in , with error 
e, if the following holds. For any independent (n, k out ) -sources X , Y, there exist min-entropy 
ki n subsources X' C X, Y' C Y, such that SubExt(X / , Y') is e-close to uniform. 

Although we are not aware of the definition of two-source sub-extractors made explicit in 
previous works, we note that the two-source disperser constructed by Barak et al. [BKS + 10] 
is in fact a two-source sub-extractor. More precisely, for any constant 5 > 0, the authors 
construct a two-source sub-extractor for outer-entropy 5n and inner-entropy poly(h)n. On 
the other hand, the state of the art two-source disperser by Barak et al. [BRSW12] does 
not seem to be a sub-extractor. 

The main theorem proved in this paper is the following. 

Theorem 1.10 (Two-source sub-extractors). There exists an explicit two-source sub-extractor 
for outer-entropy k out = polylog(n) and inner-entropy k in = k^\ with m = k^ff output 
bits and error e = 2 K »ut . 

We note that a sub-extractor for outer-entropy k out with m output bits and error e is a 
zero-error disperser for entropy k out with min(m, log(l/e)) output bits (the dependence in the 
inner-entropy k- m is due to the fact that m < k- n f). Indeed, one can simply truncate the output 
of the sub-extractor to be short enough so that the error will be small enough to guarantee 
that any possible output is obtained. In particular, a sub-extractor for outer-entropy k out 
and inner-entropy k in — 1, with error £ < 1/2, induces a bipartite 2 A: ° ut -Ramsey-graph. Thus, 
Theorem 1.10 readily implies Theorem 1.2 and Theorem 1.6. 

We further remark that the constants in Theorem 1.10 depend on each other and so 
we made no attempt to optimize them. It is worth mentioning though that one can take 
k- m = fcou/ f° r any constant 5 > 0, and even k- m = £wt/polylog(n) and still supporting 
outer-entropy k OVLt = polylog(n). 

We hope that two-source sub-extractors can be of use in some cases where two-source 
extractors are required. Further, we believe that the techniques used for constructing sub¬ 
extractors are of value in future constructions of two-source extractors. 

1.2 Organization of this paper 

In Section 2 we give an informal overview of the challenge-response mechanism. Section 3 
contains a comprehensive and detailed overview of our construction and analysis. These two 
sections are meant only for building up intuition. The reader may freely skip these sections 
at any point as we make no use of the results that appear in them. 


4 


In Section 4 we give some preliminary definitions and results that we need. Section 5 
contains the formal description of the challenge-response mechanism. In Section 6 we present 
the notions of entropy-trees and tree-structured sources. Then, in Section 7 we give the 
formal construction of our sub-extractor, and analyze it in Section 8. Finally, in Section 9 
we list some open problems. 

2 Overview of the Challenge-Response Mechanism 

Our construction of sub-extractors is based on the challenge-response mechanism that was 
introduced in [BKS + 10] and refined by [BRSW12], As we are aiming for a self-contained pa¬ 
per, in this section we explain how this powerful mechanism works. The challenge-response 
mechanism is delicate and fairly challenging to grasp. Thus, to illustrate the way the mech¬ 
anism works, we give a toy example in Section 2.4. 

2.1 Motivating the challenge-response mechanism 

We start by recalling the notation of a block-source. 

Definition 2.1. A random variable X on n-bit strings is called an (n, fc)-block-source, or 
simply a k-block-source, if the following holds: 

• 1 / 00(1 eft (AT)) > k, where left(X) is the length n/2 prefix of X. 

• For any x E supp(left(X)) it holds that Hoo(right(X) | left(X) — x) > k, where right(X) 
is the length n/2 suffix of X. 

I 11 a recent breakthrough, Li [Li 15] gave a construction of an extractor BExt for two n-bit 
sources, where the first source is a polylog(n)-block-source and the second is a weak-source 
with min-entropy polylog(n) (see Theorem 4.1). Since our goal is to construct a two-source 
sub-extractor for outer-entropy polylog(n), a first attempt would be to show that any source 
X with entropy polylog(n) has a subsource X' that is a polylog(n)-block-source. If this 
assertion were to be true then BExt would have been a two-source sub-extractor. 

This however is clearly not the case. Consider, for example, a source X that all of its 
entropy is concentrated in its right-block right(X). Namely, left(X) is fixed to some constant 
and right(X) has min-entropy k. Clearly, H x (X) > k, yet no subsource of X is even a 
1-block-source. 

This example holds only when the entropy is no larger than n/2. Indeed, one cannot 
squeeze, say, 0.6n entropy to the n/2 bits of right(X). Restricting ourselves, for the moment, 
to the very high entropy regime, we ask whether this example is the only problematic exam¬ 
ple. In particular, is it true that any source with min-entropy 0.6n is a block-source? The 
answer to this question is still no. Nevertheless, one can show that a 0.6n-weak-source on 
n-bits has a low-deficiency subsource that is a O.ln-block-source. This observation will be 
important for us later on. 
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Note that by this observation, BExt is a sub-extractor for two sources with min-entropy 
0.6n, polylog(n). However, by the example above, BExt by itself is not a sub-extractor for 
two sources with min-entropy less than 0.5n. Nevertheless, as we will see, BExt is a central 
component in our construction. 

Going back to the example, if only there were a magical algorithm that given a single 
sample x ~ A", would have been able to determine correctly whether or not left(A) is fixed 
to a constant, then we would have been in a better shape as we would have known to 
concentrate our efforts on right(A). Such an algorithm, however, is too much to hope for. 
Indeed, given just one sample x ~ X, one simply cannot tell whether the left block of X 
is fixed or not. Still, the powerful challenge-response mechanism allows one to accomplish 
almost this task. In the next section we present a slightly informal version of the challenge- 
response mechanism. The actual mechanism is described in Section 5. Our presentation is 
somewhat more abstract than the one used in [BKS + 10, BRSW12]. 

2.2 The challenge-response mechanism 

We start by presenting a dream-version of the challenge-response mechanism. 

The challenge-response mechanism — dream version 

For integers l < n, a dream version of the challenge-response mechanism would be a poly(n)- 
tirne computable function 

DreamResp: {0, l} n x {0, l} n x {0, 1 } e —>■ {fixed, hasEntropy} 

with the following property. For any two independent (n,polylog(n))-sources X,Y, and for 
any function Challenge: {0, l} n x {0,1}" —>• {0,1 } e , the following holds: 

• If Challenge(A", Y) is fixed to a constant then 

Pr [DreamResp (x, y, Challenge^, y)) = fixed] = 1. 

(?,y)~(X,Y) 


• If F/" 00 (Challenge(A, Y)) is sufficiently large then 

Pr [DreamResp (x, y, Challenge^, y)) = hasEntropy] = 1. 

{x,y)~(X,Y) 


Note that for any function Challenge, DreamResp distinguishes between the case that 
Challenge(A", Y) is fixed and the case that Challenge(A, Y) has enough entropy. Unfortu¬ 
nately, DreamResp will remain a dream. The actual challenge-response mechanism requires 
more from the inputs and has a weaker guarantee on the output. The difference between 
the dream version and the actual challenge-response mechanism contributes to why our sub¬ 
extractor is defined the way it does, and so already in this section we present the actual 
challenge-response mechanism (in a slightly informal manner). 
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The actual challenge-response mechanism 

For integers k < l < n, the challenge-response mechanism is a poly(n)-timc computable 
function 

Resp : {0, l} n x {0, l} n x {0, 1 } e —> {fixed, hasEntropy} 

with the following property. For any two independent (n, polylog(n))-sources A, A, and for 
any function Challenge: {0, l} n x {0, l} n —» {0,1} £ , the following holds: 

• If Challenge(A, Y) is fixed to a constant then there exist deficiency £ subsources X' C 
A", Y' c Y. such that 


Pr [Resp(x, y, Challenge^, y)) = fixed] = 1. 


• If for any deficiency £ subsources X C A, Y c Y it holds that ifoo(Challenge(A, Y)) > 
k, then 

Pr [Resp(x, y, Challenge^, y)) = hasEntropy] > 1 — 2~ k . 

(x,y)~(X,Y) 

We emphasize the differences between the dream-version and the actual challenge-response 
mechanism. First, even if Challenge^, Y) is fixed to a constant, it is not guaranteed that 
Resp will correctly identify this on any sample from (A, Y). In fact, it is not even guaran¬ 
teed that Resp will identify this correctly with high probability over the sample. The actual 
guarantee is that there exist low-deficiency subsources X' C A, Y' C Y, such that on any 
sample (x,y) ~ (AResp will correctly output fixed. As our goal is to construct a 
sub-extractor, this is good enough for us, as we can “imagine” that we are given samples 
from X', Y’ rather than from A, Y for the rest of the analysis (we do have to be careful when 
dealing with error terms when moving to subsources, but we will ignore this issue for now). 

The second thing to notice is that for the challenge-response mechanism to identify the 
fact that Challenge(A, Y) has entropy, a stronger assumption is made. Namely, it is not 
enough that Challenge(A, Y) has a sufficient amount of entropy, but rather we need that 
Challenge(A, Y) has enough entropy for all low-deficiency subsources A C A, Y c Y. So, 
informally speaking, for the challenge-response mechanism to identify entropy, this entropy 
must be robust in the sense that the entropy exists even in all low-deficiency subsources. 
Further, note that unlike the first case, in the second case Resp introduces a small error. 

2.3 The three-types lemma 

The challenge-response mechanism is indeed very impressive. However, the mechanism only 
distinguishes between two extreme cases - no entropy versus high entropy. It is much more 
desired to be able to distinguish between low entropy versus high entropy. Indeed, what if 
the entropy in the left block of a source is too low to work with, yet the block is not constant 
and so the challenge-response mechanism is inapplicable? 

The next lemma shows that if we are willing to work with subsources then this is a non¬ 
issue. Namely, every source has a low-deficiency subsource with a structure suitable for the 
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challenge-response mechanism. We present here a slightly informal version of this lemma. 
The reader is referred to Lemma 6.5 for a formal statement. 

Lemma 2.2 (The three-types lemma). For any (n, k)-source X and an integer b < k, there 
exists a deficiency 1 subsource X' C X such that (at least) one of the following holds: 

• X' is a b-block-source. 

• iL 0O (left(X / )) >k-b. 

• left(A"') is fixed to a constant and //^(right^Y')) > k — b. 

Take for example b = \/k. Lemma 2.2, which is a variant of the two-types lemma by 
Barak et al. [BRSW12], states a fairly surprising fact about weak-sources. Any source A" 
has a deficiency 1 subsource X' with a useful structure. If X' is not a block-source then 
either essentially all of the entropy already appears in left(A / ), or otherwise I eft (A') is fixed 
to a constant (great news for users of the challenge-response mechanism) and right(A') has 
essentially all the entropy of A". 

2.4 Playing with the challenge-response mechanism 

Lemma 2.2 is an important supplement to the challenge-response mechanism. However, it 
is still not even clear how the two together can be used to break the “0.5 barrier” discussed 
in Section 2.1. For example, how they together can be used to give a sub-extractor for 
outer-entropies 0.4n, poly log (n). 

Lets try to see what can be said. Say X is an (n, 0.4n)-source. By Lemma 2.2, applied 
with b = y/n, there exists a deficiency 1 subsource X' of A", such that one of the following 
holds: 


• X' is a i/n-block-source. 

• iL 00 (left(A / )) > 0.4n — sjn > 0.3n. 

• left(X') is fixed to a constant and i/ 00 (right(A" / )) > 0.4n — y/n > 0.3n. 

Note that in the second case, left(X') has entropy-rate 0.6. Thus, it has a deficiency 1 
subsource that is a block-source. Similarly, in the third case, right(A') has a deficiency 1 
subsource that is a block-source. Thus, any (n, 0.4n)-source has a deficiency 2 subsource 
X" C X such that at least one of A"", left(X"), right(X") is a -y/n-block-source. 

Given this, even without resorting to the challenge-response mechanism, we know that 
at least one of BExt(X", Y), B Ext (left (A"), Y), BExt(right(A"), Y) is close to uniform. The 
challenge-response mechanism allows us to obtain something stronger. Although we will not 
be able to get a sub-extractor for outer-entropies 0.4n, polylog(n) this way, it is instructive 
to see the technique being used. Set BExt to output i = o{k) bits, where k = polylog(n) is 
the outer-entropy of the second source. Consider the following algorithm. 


The toy algorithm. 

On input x,y G {0, l} n 

• Compute z(x,y ) = Resp(x, y, BExt(left(x), y)). 

• If z — fixed declare that BExt(right(a:), y) is uniform. 

• Otherwise, declare that one of BExt(x,?/), B Ext (left (a;), y) is uniform. 

The above algorithm does not look very impressive. Essentially, it only cuts down our 
lack of knowledge a bit. Instead of declaring that one of three strings is close to uniform, it is 
able to declare that one of at most two strings is close to uniform. Nevertheless, as mentioned 
above, it is instructive to see the proof technique on this simple toy example. Moreover, as 
we will see in Section 3.2, this algorithm is a special case of an algorithm by [BRSW12] that 
will be important to our construction. We now prove that the algorithm’s declaration is 
correct. More precisely, we prove the following. 

Claim 2.3. Let X be an (n,0An)-source, and letY be an independent (n, polylog (n))-source. 
Then, there exist deficiency 0(0.)-subsources X' C X, Y' C Y, such that with probability 1 
over (x,y) ~ (X',Y') the declaration of the algorithm is correct. 

The proof of Claim 2.3 showcases the following three facts about low-deficiency sub¬ 
sources. Non of these facts is very surprising, but we make extensive use of them throughout 
the paper, and it is beneficial to see these facts in action on a simple example. Here we 
give slightly informal statements. For the formal statements see Fact 4.3, Fact 4.4, and 
Lemma 4.5. 

Fact 2.4. If H 00 ( A") > k and X 1 is a deficiency d subsource of X then H^X') > k — d. 

Fact 2.5. Let X be a random variable on n-bit strings. Let f: {0, l} n —>■ {0,1} £ be an 
arbitrary function. Then, there exists a G {0, l} e and a deficiency I subsource X' of X such 
that f(x) = a for every x G supp(A"'). 

Lemma 2.6. Let X be a k-block-source, and let X' be a deficiency d subsource of X. Then, 
X' is a k — d block-source. 

Proof. We start by applying the three-types lemma as discussed above so to obtain a defi¬ 
ciency 2 subsource of X with the properties lists above. For ease of notation, we denote this 
source by A". 

Assume first that left(A) is fixed. Note that in this case BExt(left(A), Y) is a deterministic 
function of Y. Since the output length of BExt is I, Fact 2.5 implies that there exists a 
deficiency I subsource Y' C Y such that BExt(left(A), Y') is fixed to a constant. We are 
now in a position to apply the challenge-response mechanism so to conclude that there exist 
deficiency £ subsources X' C A", Y" C Y' such that 

Pr [z( X', Y") = fixed] = 1. (2.1) 
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Note that in this case, -Hoo(right(A)) > 0.3n. Therefore, by Fact 2.4, Ff 00 (right(A"')) > 
0.3tt, — i > 0.29n. Since right(A / ) has length n/2 , we have that right (AC 7 ) has entropy- 
rate 0.58, and so by Lemma 2.2, there exists a deficiency 1 subsource X" of X' such that 
X" is an f2(n)-block-source. Similarly, since H^iY) = k = cu(£), Fact 2.4 implies that 
H^Y") = k — 20. = poly log (n). Thus, BExt(right(A"), Y") is close to uniform. 

To summarize, in the case that left(A) is fixed, there exist deficiency O (T)-subsources 
X" C A", Y" C Y on which the algorithm correctly declares that B Ext (right (A"), Y") is 
uniformly distributed. 

Consider now the case that I eft (A) is not fixed. By Lemma 2.2, either A is a A/n-block- 
source, or otherwise if 00 (left(A)) > 0.3n, and so left(A) has a deficiency 1 subsource that 
is a i/u-block-source. Therefore, the algorithm’s declaration in this case is correct even on 
deficiency 0(1) subsources. □ 

In this section we gained some familiarity with the challenge-response mechanism and 
with the three-types lemma (Lemma 2.2), which is an important supplement for the mech¬ 
anism. Hopefully, this experience will assist the reader in the sequel. 


3 Overview of the Construction and Analysis 

In this section, we present our construction of sub-extractors and give a comprehensive and 
detailed overview of the proof, though we allow ourselves to be somewhat imprecise whenever 
this contributes to the presentation. The formal proof, which can easily be recovered by the 
content of this section, appears in Section 8. In Section 3.1, we introduce the notions of 
entropy-trees and tree-structured sources. A variant of this notion was used by [BRSW12], 
Then, in Section 3.2, we overview the approach taken by [BRSW12] for their construction of 
two-source dispersers. Once the results needed from [BRSW12] are in place, in Section 3.3 
we give an overview for the rest of our construction. In the following sections of this overview 
we give further details. 

3.1 Entropy-trees and tree-structured sources 

Motivating the notion of an entropy-tree 

We already saw that a source with entropy-rate 0.6 has a deficiency 1 subsource that is a 
block-source. By applying the three-types lemma (Lemma 2.2), we saw that any source X 
with entropy-rate 0.4 has a deficiency 2 subsource that is either a block-source, or otherwise 
one of left(A), right(A) is a block-source. We, however, are interested in sources A with only 
polylog(n) entropy. Is it true that there is a block-source “lying somewhere” in A (or in a 
low-deficiency subsource of A)? Yes it is, though we have to dig deeper. 

Say A has min-entropy k. Lemma 2.2, set with b = \/k, implies that there exists 
a deficiency 1 subsource X' of A" that is either a \/k- biock-source, or otherwise one of 
left(A'), right(A') has almost all the entropy of X. In other words, if X' is not a block- 
source then the entropy-rate of one of I eft (A 7 ), right(A') has almost doubled. 
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Assume that X' is not a block-source, and that I eft (A 7 ) has entropy k — \fk. By 
Lemma 2.2, set again with b = \fk\ there exists a deficiency 1 subsource X" C X' such 
that either I eft (A") is a \/h-block-source, or otherwise one of left(left(A")), right(left(A")) 
has min-entropy k — 2\/k. That is, if also left (A") is not a -s/fc-block-source then the entropy- 
rate of one of left (left (A"")), right(left(A")) is almost four-times the original entropy-rate of 
A. 

At some point we are bound to find a block-source. Indeed, if we failed to find a block- 
source in the first r iterations then there is a deficiency r subsource X ^ of A and a length 
n ■ 2~ r block of X^ that has entropy k — rVk. Thus, if A: — r\fk > 0.6n • 2~ r then this block 
has a deficiency 1 subsource that is a block-source. In particular, if k — oj(log 2 n) then a 
block-source will be found in the first log (n/k) + 0(1) iterations. 

As we apply Lemma 2.2 at most logn times and since in each application we move to a 
deficiency 1 subsource, we conclude that every (n, h)-source has a deficiency log n subsource 
that contains a block-source. This block-source can be found by following a certain “path of 
entropy” that determines which of the two halves of the current block of the source contains 
essentially all the entropy. 

Entropy-trees 

The above discussion naturally leads to what we call an entropy-tree and sources that have 
a tree-structure. An entropy-tree is a complete rooted binary tree T where some of its nodes 
are labeled by one of the following labels: H. B, F, stand for high entropy, block-source, and 
fixed, respectively. The nodes of an entropy-tree are labeled according to rules that capture 
any possible entropy structure of a subsource obtained by the process described above. The 
rules are: 


F 




B 


/ 


H 


Figure 1: An example of an entropy-tree. Unlabeled nodes and edges to them do not appear 
in the figure. 


• The root of T, denoted by root(T), is labeled by either H or B, expressing the fact that 
we assume the source itself has high entropy, and may even be a block-source. 

• There is exactly one node in T that is labeled by B, denoted by vb (T). This expresses 
the fact we proved, namely, if one digs deep enough, a block-source will be found. The 
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uniqueness of the node labeled by B captures the fact that we terminate the process 
once a block-source is found. 

• If v is a non-leaf that has no label, or otherwise labeled by F or B, then its sons have 
no label. This rule captures the fact that a node has no label if we are not interested 
in the block of the source that is associated with the node. Thus, if a block is fixed we 
do not try to look for a block-source inside it. Similarly, if the node is a block-source 
we stop the search. 

• If v is a non-leaf that is labeled by FI then the sons of v can only be labeled according 
to the following rules: 

— If leftSon(u) is labeled by F then rightSon(u) is labeled by either H or B. 

— If leftSon(w) is labeled by either H or B then rightSon(u) has no label. 

Note that these rules capture the guarantee of Lemma 2.2. 

The entropy-path. With every entropy-tree T we associate a path that we call the 
entropy-path of T. This is the unique path from root(T) to vq (T). We say that a path 
in T contains the entropy-path if it starts at root(T) and goes through Vq(T) (note that we 
allow an entropy-tree to have nodes that are descendants of vq (T). We just do not allow 
these nodes to be labeled). 

Tree-structured sources 

Now that we have defined entropy-trees, we can say what does it mean for a source to have 
a T-structure, for some entropy-tree T. To this end we need to introduce some notations. 
Let n be an integer that is a power of 2. With a string x € {0, l} n , we associate a depth 
log n complete rooted binary tree, where with each node v of T we associate a substring x v 

of x in the following natural way. x roo t (T) = x , and for v ^ root(T), if v is the left son of its 

parent, then x v = left(x parent („)); otherwise, x v = right(z par e nt p,)). 

Let T be a depth logn entropy-tree. An n-bit source X is said to have a T-structure 
with parameter k if for any node v in T the following holds: 

• If v is labeled by F then X v is fixed to a constant. 

• If v is labeled by H then /hoo(W) > k. 

• If v is labeled by B then X v is a v^fc-block-source. 

With the notions of entropy-trees and tree-structured sources, we can summarize the 
discussion of this section by saying that any (n, fc)-source, with k = (u(log 2 n) , has a deficiency 
log n-subsource that has a T-structure with parameter Vt(k) for some entropy-tree T (that 
depends on the underlying distribution of A"). Therefore, for the purpose of constructing sub¬ 
extractors, we may assume that we are given two independent samples from tree-structured 
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sources rather than from general weak-sources. Further, by Fact 2.4 and by Lemma 2.6 it 
follows that if X' is a deficiency d subsource of a source having a T-structure with parameter 
k = oj(d), then X' has a T-structure with parameter Q(k). In particular, we can move 
to o(/e)-deficiency subsources throughout the analysis and still maintain the original tree- 
structure of the source. 

3.2 Identifying the entropy-path 

Tree-structured sources certainly seem nicer to work with than general weak-sources. How¬ 
ever, it is still not clear what good is this structure for if we do not have any information 
regarding the entropy-path. 

Remarkably, by applying the challenge-response mechanism in a carefully chosen manner, 
Barak et al. [BRSW12] were able to identify the entropy-path of the entropy-tree T given 
just one sample from x ~ A", where X is a T-structured source, and one sample from y ~ Y, 
where Y is a general weak-source that is independent of X. We now turn to describe the 
algorithm used by [BRSW12]. Before we do so, it is worth mentioning that Barak et al. 
proved something somewhat different. Indeed, they considered a variant of entropy-trees 
and had to prove something a bit stronger than what we need. Nevertheless, their proof 
can be adapted in a straightforward manner to obtain the result we describe next. For 
completeness, we reprove what is needed for our construction in Section 8.1. 

What does it mean to identify the entropy-path? 

What do we mean by saying that an algorithm identifies the entropy-path of an entropy- 
tree T? This is an algorithm that on input x,y E (0,l} n , outputs a depth log n rooted 
complete binary tree and a marked root-to-leaf path on that tree, denoted by Pobserved (ah y) 
- the observed entropy-path. Ideally, the guarantee of the algorithm would have been the 
following. If x is sampled from a T-structured source X and y is sampled independently 
from a weak-source Y, then Pobserved (A, Y) contains the entropy-path of T with probability 1. 
That is, for any (x, y ) E supp((A", F)), if we draw the path /^observed (t if) on the entropy-tree 
T then this path starts at root(T) and goes through vb (T). 

Note that the path Pobserved y) is allowed to continue arbitrarily after visiting vb (T). 
Requiring that p 0 bserved(^, y) will stop at Vb (T) is a very strong requirement. In particular, 
it will conclude the construction of the sub-extractor. Indeed, once the block-source X Vb ( T ) 
is found, one can simply output BExt(X,. B ( T ), n 

This was an ideal version of what we mean by identifying an entropy-path. For onr 
needs, we will be satisfied with a weaker guarantee. Following [BRSW12], we will show that 
there exist low-deficiency subsources X' C X, Y' c Y, such that with high probability over 
{x,y) rs j (X',Y') it holds that Pobserved (x, y) contains the entropy-path of T. 

The fact that we only have a guarantee on low-deficiency subsources is good enough 
for us as we are aiming for a sub-extractor. The fact that there is an error (that did not 
appear in the analysis of [BRSW12]) should be handled with some care. Indeed, note that 
by moving to a deficiency d subsource, an e error in the original source can grow to at most 
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2 d ■ e restricted to the subsource. We will make sure that the error is negligible compared 
to the deficiency we consider in the rest of the analysis. Thus, from here on we will forget 
about the error introduced in this step of identifying the entropy-path. 

The algorithm of [BRSW12] for identifying the entropy-path 

We now describe the algorithm used by [BRSW12] for identifying the entropy-path of an 
entropy-tree T. The basic idea was depicted already in the toy algorithm from Section 2.4. 
In fact, what the toy algorithm was actually managed to do was to find the entropy-path in 
depth 2 tree-structured sources. 

We first note that if root(T) = v& (T) then any observed entropy-path will contain vq (T). 
So, we may assume that this is not the case. Let v be the parent of vb(T) in T. As a first 
step, we want to determine which of the two sons of v is (T). To this end, we will use the 
toy algorithm from Section 2.4. More precisely, node v declares that its left son is v&(T) if 
and only if 

Response (x v ,y, BExt (xi eftSon („),2/)) = hasEntropy. (3.1) 

Lets pause for a moment to introduce some notations. If Equation (3.1) holds, we say 
that the node v (x, y)- favors its left-son; otherwise, we say that v (x, y)-favors its right son. 
Moreover, we define the good son of v to be v& (T). More generally, for a node u ^ vb (T) 
that is an ancestor of Vb (T), we define the good son of u to be its unique son that is an 
ancestor of vb (T). Note that by following the good sons from root(T) to vb (T) one recovers 
the entropy-path of T. Thus, to recover the entropy-path of T it is enough that any ancestor 
of vb (T) on the entropy-path of T favors its good son. 

By following the proof of Claim 2.3, one can see that if WeftSonf^) is fixed then Equa¬ 
tion (3.1) holds with probability 0 on some low-deficiency subsources of X,Y. Further, by 
the challenge-response mechanism together with Fact 2.4 and Lemma 2.6, one can show that 
if leftSon(u) = vb(T) then with high probability over (X, Y), Equation (3.1) holds. Observe 
that by the definition of an entropy-tree, these are the only two possible cases. 

We showed how vb (T) can convince its parent v that it is its good son. The trick was 
to use the block-source-ness of X VB ( T ) so to generate a proper challenge. Considering one 
step further, we ask the following. If u is the parent of v, how can v convince u that it is 
its good son? After all, v is not a block-source. The elegant solution of Barak et al. is as 
follows. Given x,y G {0, l} n , the challenge of v will contain not only BExt(x„,?/) but also 
BExt(x^, y), where w is As favored son. Thus, if As favored son happens to be its good son 
vb (T), then the challenge posed by v will not be responded by u. 

More generally, node v decides which of its two sons it (x, y)- favors not according to 
Equation (3.1) but rather according to wether or not 

Response (x v , y, Challenge (x| e f t son(» ; y)) = hasEntropy, (3.2) 

where Challenge(xi e f t s on p;), y) is a matrix with at most logn rows (according to the depth 
of the tree) that contains BExt(xi e f t son(», y) as row, as well as BExt(x™, y), where w is the 
(x, ?/)-favored son of leftSon(v), and also BExt (x r ,y), where r is the (x, ?/)-favored son of w, 
etc. 
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The strategy of [BRSW12] for determining the output 

Having found the entropy-path of T , we are in a much better shape. We know that one of 
the nodes on the path is a block-source. The trouble is that we still do not know which one. 
We conclude this section by saying only a few words about the strategy taken by [BRSW12] 
as at this point our strategy deviates from theirs. It is worth mentioning that the strategy 
taken by Barak et al. for determining the output is one place in the construction that poses 
a bottleneck for supporting entropy o(2 v/log?l ). It is also the reason why the number of output 
bits in their construction is at most O (log log n) and why the construction can only be a 
disperser rather than a sub-extractor. 

In order to output a non-constant bit, as required by a 1 output bit disperser, Barak et al. 
assumed that the source X has some more structure. Not only X should have a T-structure, 
but it is also required that left(X„ B (T)) has its own tree-structure. In particular, somewhere in 
the left block of X VB ( T ) there should be a second block-source. Note that this extra structure 
required from A" can be assumed with almost no cost in parameters. Indeed, after applying 
the process from Section 3.1 to X so to obtain a deficiency logn subsource X' of X that has 
a tree-structure, one can simply apply the process again, this time to left(A' B ^ T s), so to find 
a deficiency 2 log n subsource of X with the desired structure. 

Having this “double-block-source” structure, Barak et al. were able to carefully tune the 
parameters of the challenge-response mechanism so that with some probability v& (T) will 
be convinced that A| e f t s on (i; B (T)) contains a block-source, yet with some probability it will fail 
to notice this. With some more delicate work, the fact that ub(T)’s decision is not constant 
can be carried upwards all the way to root(T) and in turn, can be translated to an output 
bit that is non-constant. 

3.3 The strategy for the rest of our construction 

To carry the analysis of our sub-extractor, we require even more structure from our sources 
than the structure required by [BRSW12], First, we require both X and Y to have a tree- 
structure. In previous works [BKS + 10, BRSW12], the second source Y was used mainly 
to “locate the entropy” of the source X, and the only assumption on Y was that it has a 
sufficient amount of entropy. We however will make use of the structure of Y as well. 

Second, we will need X to have a “triple-block-source” structure. That is, we assume 
that X has a Tx-structure with a node v top (T x ) corresponding to the block-source X Vto (t x )- 
We then assume that left(A„ t (t x )) has its own tree-structure with a node t’mid {Tx) corre¬ 
sponding to a second block-source X „ mid (Tx)- Finally, we require that left(A„ mid ( Tx )) has its 
own tree-structure with a node Ubot(Tv) that corresponds to a third block-source X Vhot ( Tx ). 

As for Y , our analysis only requires a “double-block-source” structure. Though, to keep 
the notation cleaner we will assume that Y also has a triple-block-source structure. In 
particular, the entropy-tree of Y , denoted by Ty, has nodes that we denote by n t0 p(7V), 
Wmid(Ty), and u hot (T Y ) analogous to v top (T x ), v mid (T x ), and v hot (T x ) in T x . 

In fact, we allow ourselves to change the definition of an entropy-tree given in the previous 
section so that it will capture this “triple-block-source” structure, but the reader should not 
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root(T ) 



Figure 2: The “triple-block-source” structure of an entropy-tree. 

worry about these details at this point. For the formal definition see Section 6. 

Given this structure of the sources, we are ready to give a high-level overview of our 
construction. In the subsequent sections of the overview, we give further details. Let X be 
a 7x-structured source and let Y be a Ty-structured source, for some entropy-trees Tx , TV• 
At the first step, the sub-extractor identifies the entropy-path of T x and the entropy-path 
of Ty using the algorithm of [BRSW12], More precisely, given the samples x,y, we compute 
two paths denoted by 


Pobserved(*U V) A) (*G V) i Vl (x , y ), . . . , Ulog(n) —1 (*U 'l/') i 
(/observed^; y) |/),Wi(x, ?/), . . . , Ui 0 g( n )_i (x, if). 

This step must be done with some care. From technical reasons, we cannot use x, y to first 
find the entropy-path of Tx and then to find the entropy-path of Ty. Thus, in some sense, 
the two paths must be computed simultaneously. 

At this point we have that there exist low-deficiency subsources X' C A", Y' c Y, such 
that for any (x,y) G supp((A', Y')) it holds that p 0 bserved(^, y) (resp. (/observed (25 y)) contains 
the entropy-path of T\ (resp. Ty). In particular, we have that tq e pth(uto P (T x ))(A" / , Y') is fixed 
to Utop (Tx), and the same holds for u m id(Fx), as well as for u Xl0p (Ty). u m ia(Ty), and 

u bot (Ty). To keep the notation clean, we write X, Y for A"', Y' in this proof overview. That 
is, we assume that the entropy-paths are correctly identified on the tree-structured sources. 

At the second step of the algorithm we identify v mi< i(T x ) with high probability over 
subsources X' C A", Y' C Y. This sounds fantastic - having found v m - w \{T x ) : we can 
simply output BExt(A' ^ Tx ^Y'). Unfortunately, however, the only way we know how to 
find u m id (T x ) requires us to fix left (A', , d ( Tv ))- That is, once found, X' v . d ( Tx ) no l° n g er a 
block-source. Moreover, to find u mid (T x ) we also have to fix left . d (r Y ))- 

We elaborate on how to find v m id(T x ) in Section 3.4. Then in Section 3.5, we show how to 
determine the output of the sub-extractor even after loosing the block-structure of X Vmid (T x )- 
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3.4 Finding v mid (T x ) 

The node-path challenge and v£h*[ erved (x, y) 

Given x, y e {0, l} n , the key idea we use for identifying v m \d(T x ) on p 0 bserved(u y) lies in the 
design of a challenge that we call the node-path challenge. Let v be a node in T x , and let 
q = wo, ..., wi 0 g(„)_i be a root-to-leaf path in T Y . We define the challenge NodePathCh(x„, y q ) 
to be the log(n)-rows Boolean matrix such that for i — 0,1,, log(n) — 1, 

NodePathCh(x„, y q )i = BExt (y w .,x v ). 

We define u^ erved (x, y) to be the node v on Pobserved^j y) with the largest depth such that 
Response (x, y, NodePathCh (x v , y q ohseTved {x, y ))) = hasEntropy. (3.3) 


Pobserved(. x > y) 


hobservedi x > y) 



Figure 3: The node-path challenge. 

Ideally, we would want to prove that u^ erved (x, y) = u m id (T x ) for any (x, y) G supp((X, y)). 
By now we know that this is too much to ask for, and in any case, it suffices to prove that 
there exist low-deficiency subsources X' C X, Y' C Y such that with high probability over 
(x,y) ~ it holds that v^| erved (x, y) = n m id (T x ). Unfortunately, we will not be able 

to prove that either. What we will be able to show is that there exist strings a, f3 such that 
the following holds. Define 

X a = X | (W| e ftS on (u mid (Tx )) = 7 

Yp = } | (^ leftSon(u mid (Ty)) = ' ) 1 
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and let i m k\(T x ) denote the depth of v mi d(T x ). 

The way we choose a, ft is with respect to the error that we constantly ignore throughout 
this overview. Thus, assume that a, ft are chosen in such a way that allows us to continue 
ignoring the error. No further requirement is posed on a,/3. 

Proposition 3.1. There exist low-deficiency subsources X aj p C X a , Y a _p C Yp, such that 
with high probability over (x,y) ~ (X a> p,Y a> p), it holds that 

Mi > imidiTx) Response (x,y, NodePathCh (x Vi ( x , y ),y qobserved ( x , y ))) = fixed, 

Response (x,y, NodePathCh (x Vm . d{Tx) ,y qobsened{x , y) )) = hasEntropy. 

Note that by the way we defined n^ erved (a;, y), Proposition 3.1 yields that n^| erved (x, y) = 
v mid (T x ) with high probability over (x,y) ~ (X aj p,Y a> p). In particular, this gives us an 
algorithm for computing v mv \{T x ) - simply go up the computed path Pobserved^j y) until 
a node v is found for which Equation (3.3) holds. In the rest of this section we prove 
Proposition 3.1. 


The challenges of descendants of u m id {T x ) on p 0 bserved(^ ; y) are properly responded 

Proposition 3.1 has two parts. First, it states that the node-path challenges associated with 
nodes below v m \d{T x ) on the path p 0 bserved(^, y) are responded with high probability over 
the samples from some low-deficiency subsources of X a ,Yp. Second, Proposition 3.1 argues 
that the node-path challenge associated with v m \ ( \{T x ) is unresponded with high probability 
over the samples. 

Lets first consider the nodes below v m \ ( \{T x ) on Pobserved^, y)- Naturally, we want to 
use the challenge-response mechanism. For that we must find low-deficiency subsources 
X' a C X a , Yp C Yp such that for all i > i m ki{T x ), the challenge 

NodePathCh {[X'^x^p (Yp) 

^observed (-^q: ^ 

is fixed to a constant. As was done in the analysis of the toy algorithm from Section 2.4, to 
this end it is enough that the random variable 


NodePathCh ( Y ?) (/observed j -^/3 ) ) 

is a deterministic function of Yp. Indeed, in such case and since the challenge consists of 
a relatively small number of bits, we can apply Fact 2.5 to find a low-deficiency subsource 
Yt C Yp such that Equation (3.4) is fixed to a constant. 

For i > i mid (T x ), our starting point is the random variable 

NodePathCh ((A : a ) v ^ Xa ,Y p ), (Y^q^^x^)) ■ 


To make sure that this random variable depends solely on Yp, we need to show that the 
dependence in all three syntactical appearances of X a can be removed. We start with 


^observed ( A a , Yp). 
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Claim 3.2. There exists a deficiency log n subsource X' a of X a such that q 0 bserved(X' a ,Yp ) is 
fixed to a constant. 

Proof. Let ibot(Yy) denote the depth of n bot (Ty). To prove the claim, we first recall that the 
path <?observed(^cn Yp) contains the entropy-path of T Y . In particular, we have that the nodes 
uo(X a , Yp ),..., Ui hot (r Y )(X a , Yp) are fixed. It is left to argue that there is a low-deficiency 
subsource X' a C X a such that the remaining nodes u ibot (r Y )+i(X' a , Yp ),..., u\ og ( n )_i(X' a , Yp) 
are fixed as well. 

Let us first consider the node u ihot (T Y \ +1 (X a ,Yp) that is the son of u ihot (T Y )(X a , Yp) = 
Ubot(T y ). According to Equation (3.2), node u\, 0 t(T Y ) decides which of its two sons will be 
on (/observed(A q, Yp} according to whether or not 

Response ((Es) Ubot ( Ty ), X a , Challenge((y^)ieftSon(u bot (Ty)), X a )) = hasEntropy. (3.5) 

By the definition of an entropy-tree, u\ mt (T Y ) is a descendant of leftSon(w mi d(PV))- Further, 
by definition, (Ya)ief tSo n( Umid (T r )) is fixed to ft. Thus, also (Yp) Uhot( r Y ) and (^)i e ftSon(u bot (Ty)) 
are fixed to some constants. Therefore, the Boolean expression in Equation (3.5) is a deter¬ 
ministic function of X a . By applying Fact 2.5, we obtain a deficiency 1 subsource X' of X a 
such that the Boolean expression in Equation (3.5) is fixed. In particular, tq bot m)+i(A",n) 
is fixed to a constant. 

At this point we can apply the same argument to ibot(Ey) + 2. Indeed, Ui bot ( 7 y)+ \(X', Yp) 
is fixed to a constant and all appearances of Yp in the Boolean expression that is analogous to 
Equation (3.5) are again fixed to constants for the same reason as before. Since this process 
terminates after at most logn steps and since in each iteration we move to a deficiency 1 
subsource of the previous obtained subsource, the claim follows. O 

Given Claim 3.2, we turn to show that for any i > i m ki(Tx), 

NodePathCh ((X' a ) v . {XLYfl) , (Yp) 

(/observed 

is a deterministic function of Yp. By the discussion above, this will prove the first part of 
Proposition 3.1. 

By Claim 3.2, we already know that (/observed (A"^, Yg) is fixed to a constant. Thus, it 
suffices to show that (X' a ) Vi (x^,Yp) is a deterministic function of Yp for all i > imid(T x ). By 
an argument similar to the one used in the proof of Claim 3.2, one can show that for any 
such i, Vi(X' a ,Yp) is a deterministic function of Yp. Note further that, by the definition 
of an entropy-tree, since i > i m id(Tx) we have that Vi(X' a ,Yp) is always (that is, for every 
(x,y) e supp((X^, Yp))) a descendant of leftSon(n mid (T x )). Since (A^)| eftSon( ^ m . d(Tx)) is fixed 
to a constant we conclude that (X' a ) Vi ^x^,Yfi) a deterministic function of Yp. 

By the discussion above, we are now in a position to apply Fact 2.5 so to obtain a 
low-deficiency subsource Yf c Yp such that 

NodePathCh ((Xi)„, (x . il?) , (g) 

(/observed ^ 
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is fixed to a constant. We can then apply the challenge-response mechanism to show that 
there exist low-deficiency subsources X a ^ C X' a , Y a ^ C Yp such that for any (x,y) G 
supp((X a ^,y ai ^)) it holds that 

Vi > imid(T x ) Response (x,y, NodePathCh {x Vi ( x , y ),y qobseIved ( x , y ))) = fixed. 

The challenge of u m id(Xx) is unresponded 

To prove Proposition 3.1, it is left to show that the node-path challenge associated with 
t'mid(Tx) is unresponded. More precisely, it suffices to show that with high probability over 
(z,y) rs j (X Q)/ 3 , Y a r-j), it holds that 


Response (x,y, NodePathCh (x Vmid{Tx ), y qobseTved ( x , y ))) = hasEntropy. 
Since u top (T Y ) is on the path g 0 bserved(AA,/n Y a ,p), the matrix 
NodePathCh ((X aj p) Vjnid{Tx) , {Y a ^) 

^observed ^cx,/3 ) ) 


(observed 


^observed 


contains the row 


BExt ((U, / s)«, op (T Y ).(A' t ,^)„ mM (T x )) ■ 


(3.7) 


Since W mid (T x ) is a block-source, (X a ) Vinid (r x ) has a significant amount of entropy. Indeed, 
X a is obtained from A" by fixing Weft Son (^ mid (T x )) = left(^ mid(Tx )). Therefore, by Fact 2.5, 
(X a ,p)i ; mid (T x ) also has a significant amount of entropy. 

We now observe that {Y a> p) u top (iy) i s a block-source. Indeed, Y u (t y ) is a block-source and 
Yp is obtained from Y by fixing WftSon(w rriid (7y))- Since W rni(1 (t y ) is a block-source, this fixing 
leaves some entropy in {Yp) Umid (T Y )- Recall further that (F» Umid (T y ) lies inside left((F^)„ top(Tv) ) 
as nmid(Ty) is a descendant of leftSon(u to p(7V))- Thus, we see that (Y a> p) u . top (T v ) i s a block- 
source. 

Consider now any low-deficiency subsources X C X a>j g, Y C Y a p. By Fact 2.5 and by 
Lemma 2.6 we have that A,, rnid (j x ) has a significant amount of entropy and that W top ( t y ) 
is a block-source (with some deterioration in parameters). Thus, for any low-deficiency 
subsources X,Y of X a> p, Y a> p, respectively, we have that the challenge matrix associated 
with r m id(Tx) contains a row that is close to uniform. In particular this matrix is close 
to having high entropy. Thus, by the challenge-response mechanism, we have that the 
node-path challenge associated with v mi( \(T X ) is unresponded with high probability over 
(x,y) ~ (X aj p,Y at p), as desired. 

3.5 Determining the output 

At the last step of the algorithm we compute the output of the sub-extractor. The output 
of the sub-extractor is defined as 



20 


where by ^observed^ ^ ojwe denote the block-source with first block ^observed^ ^ and second 
block that equals x. Technically, we need to append the first block with zeros so that both 
blocks will have the same length, and also append y with zeros. 

There are two potential problems with applying BExt the way we do above. First, we see 
that the block-source fed to BExt depends on the sample y, which is problematic since y is 
used as a sample from the weak-source as well. This, however, is a non-issue. Indeed, recall 
that with high probability over (x,y) ~ (X a ^,Y atf 3 ) it holds that t^d erved (£, v) = v mid(T\), 
and so ignoring a small error, the computation of the extractor BExt above is the same as 

BExt K mid (T x )°x,y) ■ 

Now that we have shown that there are no dependencies between the two samples fed to 
BExt, we only need to make sure that the first sample is indeed coming from a block-source 
when sampling (x,y) ~ (X a> p,Y a ^). 

Too see why this is true, recall that u m id (Tx) is a descendant of leftSon(u top (Tx)) and 
that (t x ) is a block-source. Since X a ^ is obtained from X by fixing -WeftSon(i> mid (T x )) 
(and by moving to low-deficiency subsources) and since ^ mid (T x ) is a block-source, we have 
that (X at p ) Vtop (T x ) is also a block-source. Therefore, (X aiJ 3 )„ mid (T x ) o X Q ^ is also a block- 
source. This shows that the application of BExt above is valid, and that the output is close 
to uniform with high probability over (X a ^,Y a> y). 

4 Preliminaries 

4.1 Standard (and less standard) notations and definitions 

The logarithm in this paper is always taken base 2. For every natural number n > 1, define 
[n] = { 1 , 2 ,..., n}. 

Strings and matrices. Let n be an integer that is a power of 2, namely n = 2 m for 
some non-negative integer m. Let x G {0,1}". For i e [n], we let x* denote the i’th bit 
of x. For 0 7 ^ / C [n \, we let Xj denote the projection of X to the coordinates in I. We 
denote by left(x) the n /2 leftmost bits of x and by right(x) the nj 2 rightmost bits of x. That 
is, left(x) = x\ ■ ■ ■ x n /2 and right(x) = X( n / 2 )+i • • • x n . We denote the concatenation of two 
strings x, y by x o y. Given an r x n matrix x, for i — 0,1,..., r — 1, we let Xj denote row i 
of x. 

Trees. Let T be a complete rooted binary tree. We denote the root of T by root(T). 
Throughout the paper we consider trees where some of the nodes are labeled by labels from 
a ground set L. If v is a labeled node in a tree T, we denote its label by label(u). If v is a 
non-leaf in T, we denote the left and right sons of v by leftSon(u), rightSon(u), respectively. 
If v is not the root of T, parent(u) denotes the parent of v. The depth of T is denoted by 
depth (T). The depth of a node v in T, denoted by depth (u), is the distance in edges from 
root(T) to v. Note that depth(root(T)) = 0. 
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Random variables and distributions. We sometimes abuse notation and syntactically 
treat random variables and their distribution as equal. Let X, Y be two random variables. 
We say that Y is a deterministic function of X if the value of X determines the value of Y. 
Namely, there exists a function / such that Y = f(X). 

Associating strings with trees. Let n be a power of 2 and let x G {0, l} n . The tree 
that is associated with x, denoted by T x , is a depth logn complete rooted binary tree, where 
with each node v of T x we associate a substring x v of x as follows: 

* -^root(T) *£• 

• For v 7 ^ root(T), if v is the left son of its parent, then x v = left(:£p arent p,)); otherwise, 
%v right(Xparent(i;))- 

Statistical distance. The statistical distance between two distributions X, Y on a com¬ 
mon domain D is defined by 


SD (X, Y) = max {| Pr[X e A] - Pr[Y e A] |} . 


If SD(A, Y) < e we say that X is e-close to Y. 


Min-entropy. The min-entropy of a random variable X is defined by 



If A" is supported on (0,l} n , we define the min-entropy rate of A" by H 00 (X)/n. In such 
case, if X has min-entropy k or more, we say that X is an (n, fc)-weak-source. 

4.2 Li’s block-source—weak-source extractor 

Let A" be a random variable on n bit strings, and assume n is even. We say that X is an 
(n,k)-block-source if the following holds: 


• R^leftpO) > k. 

• For any x <G supp(left(A’)) it holds that Ltoo(right(A) | I eft (A) — x) > k. 


We sometimes omit the length n of X and say that X is a fc-block-source. 

In a recent breakthrough, Li [Lil5] gave a construction of an extractor for two n-bit 
sources, where the first source is a polylog(n)-block-source and the second is a weak-source 
with min-entropy polylog(n). Our construction heavily relics on Li’s extractor. 
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Theorem 4.1 ( [Lil5] ). There exists a universal constant 7 > 0 such that the following holds. 
For all integers n,k with k > log 12 n, there is a poly {n)-time computable function 

BExt: {0,l} n x {0,l} n -)■ {0,l} m 

such that if X is a k-block-source, where each block is on n/2 bits, and Y is an independent 
(n,k)-source, then 

SD((BExt (X,Y),Y),(U m ,Y))<e, 

and 

SD((BExt (X,Y),X),(U m ,X))<e, 

where m = 0.9 k and e = 

Let t < n be even integers. We sometimes apply BExt on strings x G {0,1}* and 
y € {0, l} n and write BExt(x, y). Formally, we actually compute BExt (a;',?/) where x' is 
obtained by appending (n — t )/2 zeros before and after x. This way of padding x preserves 
the block-structure of x. 

4.3 Subsources 

The notion of a subsource was first explicitly introduced and studied by Barak et al. [BKS + 10]. 
We start by giving the definition of a subsource and then collect some facts about subsources. 

Definition 4.2 (Subsource). Given random variables X and X' on {0, l } n 7 we say that X' 
is a deficiency d subsource of X and write X' C A" if there exists a set A C {0, l} n such that 
(. X | A) = X' and Pr[A G A] > 2~ d . More precisely, for every a G A, Pr[X' = a] is defined 
by Pr[A = a \ X G A\ and for a £ A, Pr[X' = a] = 0. 

Fact 4.3 ([BRSW12], Fact 3.11). If X is an (■ n,k)-source and X' is a deficiency d subsource 
of X then X' is an (■ n , k — d) -source. 

Fact 4.4 ([BRSW12], Fact 3.13). Let X be a random variable on n-bit strings. Let f: {0, l} n —> 
{0, l} m be a function. Then, there exists a G {0, l} m and a deficiency m subsource X' of X 
such that f(x) = a for every x G supp(X'). 

Lemma 4.5 ([BRSW12], Corollary 3.19). Let X be a k-block-source, and let X' be a defi¬ 
ciency d subsource of X. Then, X' is e-close to being a k — d — log(l/e) — 1 block-source. 

5 The Challenge-Response Mechanism 

In this section we further abstract the challenge-response mechanism that was introduced 
in [BKS + 10] and refined by [BRSW12], This abstraction will make it easier for us to apply the 
mechanism in our proofs. The reader is referred to Section 2 for an intuitive-level overview 
of the mechanism. 
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Theorem 5.1. For integers I <n, there exists a poly (n)-time computable function 

Resp : {0, l} n x {0, l} n x {0,1 } e —> {fixed, hasEntropy} 

with the following property. For any two independent (n, k)-sources X,Y with k > log 10 n 
and for any function Challenge: {0, l} n x {0, l} n —* {0,1 } £ , the following holds: 

• If Challenge(A", Y) is fixed to a constant then there exist deficiency 2£ subsources X' C 
X, Y' C Y, such that 


Pr [Resp (x, y, Challenge^, y)) = fixed] = 1. 


• If for any deficiency 20£ subsources X c A", Y c Y it holds that Challenge(A", Y) is 
e-close to having min-entropy k, then 

Pr [Resp (x, y, Challenge^, y)) = fixed] < ( 2~ k + e) ■ poly(n). 

{x,y)~(X,Y) 


The proof of Theorem 5.1 readily follows from the following theorem. 

Theorem 5.2 (Theorem 4.3, [BRSW12]). There exist universal constants 7 , c such that for 
any integer n, there exists a poly (n)-time computable function 

SE: {0,1}" x{0,l}"-> ({0,lrt r , 

with £ = 7 k and r = n c , such that the following holds. For any (n, k) -independent sources 
X,Y, with k > log 10 n, it holds that: 

• Let a be any fixed £ bit string. Then, there are subsources X a C 2 e X, Y a C 2 1 Y and an 
index i G [r] such that Pr [SE(A a , Y a )i — a] — 1. 

• Given any particular row index i G [r], (A, Y) is 2 wl - C i ose f 0 a convex combination 
of subsources such that for every (X,Y) in the combination, 

— X is a deficiency 201 subsource of X. 

— Y is a deficiency 20^ subsource ofY. 

— X, Y are independent. 

— SE(A,T)j is fixed to a constant. 

Proof of Theorem 5.1. We first describe the algorithm for computing the response function 
Response(a;, y, Challenge^, y)) on input x,y G {0, l} n . The algorithm computes SE(a;,?/), 
where the output length of SE is set to £. The algorithm then checks whether or not 
Challenge^, y) appears as a row in SE (x,y). If so the algorithm outputs fixed; otherwise 
it outputs hasEntropy. 
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We turn to the analysis. Assume first that Challenge(A, Y) is fixed. By Theorem 5.2, 
there exist deficiency 2 I subsources X' C X, Y' C Y and an index i E [r] such that with 
probability 1 over (x,y) ~ (X',Y') it holds that SE (x,y) = Challenge^, y), thus proving the 
first part of the theorem. 

Assume now that for any deficiency 20£ subsources X C A", Y c Y, it holds that 
Challenge(A, Y) is e-close to having min-entropy k. Consider any fixed i E [r]. By Theo¬ 
rem 5.2, (A, Y) is 2 _10f -close to a convex combination of subsources such that every (A, Y) 
in the combination has the four listed properties. Since Challenge(A", Y) is e-close to having 
min-entropy k, we have that 

Pr [Challenge^, y) = SE(x, y)i] < 2~ k + e. 

(x,y)~(X,Y) 

Accounting for the distance from (A, Y) to the convex combination, 

Pr [Challenge^, y ) = SE(a:, y)i] < 2~ k + e + 2 -10£ . 

(x,y)~(X,Y) 

Therefore, by the union bound over all i E [r], 

Pr [3 i E [r] Challenge^, y) = SE(a;, y)i] < (2 _fc + e + 2 - 10 £ )r. 

(x,y)~(X,Y) 

As r = poly('n) and since k < l < 10£, the proof follows. Q 


6 Entropy-Trees and Tree-Structured Sources 

Definition 6.1. An entropy-tree T is a complete rooted binary tree where some of the nodes 
of the tree are labeled by one of the following labels: F, H. B top , B rnid , B bot , according to the 
following set of rules: 

• label(root(T)) E {H,B top }. 

• There is exactly one node in T that is labeled by B top , one node that is labeled by B mid , 
and one node labeled by B bot , denoted by v top (T), v mi d(T ) and v bot (T), respectively. 
Further, v m id{T ) is a descendant of leftSon(u top (T)), and v bot (T ) is a descendant of 
leftSon (v mid {T)). We denote i top (T) = depth(u top (T)), i m id(T) = depth(tw(T)), and 
ibot(T) = depth(u 6ot (T)). 

• If v is a non-leaf that has no label or otherwise is labeled by F or B bot then both its sons 
have no label. 

• If v is a non-leaf labeled by Eh B top or B mid then leftSon (v) has a label. Further, 

— If label (leftSon (v)) = F then rightSon(u) has a label and label(rightSon(u)) 7 ^ F. 

— If label(leftSon(T)) 7 ^ F then the right son of v has no label. 
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In our proofs we consider two sources, each having its own tree-structure. In such cases, 
we use v to denote a node in one entropy-tree and u to denote a node in the other entropy- 
tree. So, for example, we will use v top (Tx) to denote the node in the first entropy-tree labeled 
by B top , whereas we use Mtop(^V) to denote the node in the second entropy-tree labeled by 

Stop- 

Definition 6.2 (Entropy-path). LetT be an entropy-tree. The entropy-path ofT is the path 
that starts at root (T) and ends at Vbot(T). We denote the nodes on this path by root (T) = 
vq(T), ... ,Vi bot (r)(T ) = VbotiT). We say that a path p in T contains the entropy-path of T if 
p starts at root (T) and goes through vj, 0 t(T)- 

Definition 6.3 (Good son). Let T be an entropy-tree and let v Vbot(T ) be an ancestor of 
VbotiT). The good son ofv is defined to be the unique son of v that is an ancestor of Vbot(T). 


Definition 6.4. Let T be an entropy-tree. We say that an n-bit random variable X has a 
T-structure with parameters (k,e) if the following holds. For any node v in T: 

• If label(v) = F then X v is fixed to a constant. 

• If label(v) = B top then X v is e-close to a k 1 ^ 2 -block-source. 

• If \ahe\(v) = B mid then X v is e-close to a k 1 ^-block-source. 

• If \ahe\(v) = Bbot then X v is e-close to a k 1/l& -block-source. 

• If label(v) = H then the following holds: 


— If v is an ancestor of v top (T) then X v ) > k. 

— If v is a descendant ofv top (T ) and an ancestor of v m id(T) then H 00 (X V ) > k 1 ^ 2 . 

— If v is a descendant of v m idiT) then H oc (X v ) > k 1 / 4 . 


We further make use of the following lemma, which is analogous to the two-types lemma 
of Barak et al. (see [BRSW12], Lemma 6.8). 

Lemma 6.5 (Three-types lemma). For any ( n,k)-source X there exists a deficiency 1 sub¬ 
source X' C X such that (at least) one of the following holds: 

• Lr oo (left(X / )) >k-Vk- 1. 

• X' is a Vk-block-source. 

• left(A"') is fixed to a constant and //^(right^Y')) > k — \fk — 1. 


For the proof of Lemma 6.5, we make use of the following “fixing entropies” lemma 
by Barak et al. [BRSW12], We state the lemma for a special case (and with slightly 
stronger parameters, which are easily achievable for that specific case by following the proof 
of [BRSW12]). 
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Lemma 6.6 ([BRSW12], Lemma 3.20). Let X be an (■ n,k)-source. Let 0 < 7 i < r 2 < n be 
any two numbers. Set To = 0 and r 3 = n. Then, there exist a deficiency 1 subsource X' C A" 
and an index i G {0,1, 2} such that the following holds: 

• For any fixing o/ left (A 7 ), LT 00 (right(A')) G [rj,r i+ i]. 

. // 00 (left(A')) + r m >A;-l. 

Proof of Lemma 6.5. Set T\ = a /k, T 2 = k — yfk — 1 and apply Lemma 6.6 to obtain a 
deficiency 1 subsource X" C X and i G {0,1,2}. We consider three cases, according to the 
value of i. 

• If i — 0 then by the second item of Lemma 6 . 6 , f/' 00 (left(A ,/ )) + T\ > k — 1. Thus, 
LT 00 (left(A")) >k-yfk- 1. We then take X' = X". 

• If i — 1 then for any fixing of I eft (A 7 '), we have that LT 00 (right(A"")) G [ri,r 2 ] = 
[y/k, k — y/~k — 1]. By the second item of Lemma 6 . 6 , H oa ( left(A ,/ )) > k — 1 — r 2 = y/k. 
Therefore, X" is a \/fc-block-source, and we take X' = A". 

• If i — 2 then for any fixing of left (A"), we have that LT 00 (right(A"")) > r 2 = k — yfk — 1. 
We take X' be a subsource of X" conditioned on an arbitrary fixing of left(A" 7/ ). 


D 

The following fact follows by applying Lemma 6.5 iteratively as was described in Sec¬ 
tion 3.1 (see also Lemma 6.10 of [BRSW12], which proves essentially the same result). 

Fact 6.7. Let X be an ( n,k)-source with k = o;(log 8 n). Then, there exist an entropy- 
tree T and a deficiency logn subsource X T C X that has a T-structure with parameters 
(k/2 ,2- n(fcl/1 °)). 


7 The Two-Source Sub-Extractor 

In this section we describe our two-source sub-extractor. Let n be a power of 2, and let £ be 
a parameter. On input x,y G (0, l} n , the computation of the sub-extractor is done in three 
steps. 

Step 1 — Identify the entropy-paths 

Setting the challenges. With each node v of T x , we associate a log(n) Boolean matrix, 
denoted by Challenge^,,,?/), computed from leaves to root, recursively as follows. All entries 
in rows 0 ,... , depth(w) — 1 of Challenge^,,, y) are fixed to 0 . Row depth(u) of Challenge^,,, y) 
contains BExt(x„,?/), where BExt is the extractor from Theorem 4.1 set to output l bits. 
If v is a non-leaf, rows depth (v) + 1,..., log(rt) — 1 are copied from the respective rows of 
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Challenge^ieftSonCi;), y) or from the respective rows of Challenge(a; r j ght son(Vb y) according to the 
following rule. If 

Resp (x v , y, Challenge (xieftson^), y)) = fixed 

then the remaining rows are taken from the corresponding rows of Challenge(x r j g htSonp;),y)- 
Otherwise, the rows are taken from the corresponding rows of Challenge(£| e ftSonpj), y)- In the 
first case we say that v (. x , y)-favors its right son, and in the second case we say that v 
(x, y)-favors its left son. 

Analogously, with each node u of T y we associate a log(n) x £ Boolean matrix, denoted 
by Challenge^, x), defined recursively as follows. All entries in rows 0,..., depth(w) — 1 
of Challenge(y u , x) are fixed to 0. Row depth(w) of Challenge(y n , x) contains BExt(y u , :r). If 
u is a non-leaf, rows depth(w) + 1 ,..., log(n) — 1 are copied from the respective rows of 
Challenge(yi eft son(«), x) or from the respective rows of Challenge(y r j g h t s 0 n(u), x) according to the 
following rule. If 

Resp (y u , x, Challenge (yi e f t Son(«), x)) = fixed 

then the remaining rows are taken from the corresponding rows of Challenge(y r j g htSon(u)> x). 
Otherwise, the rows are taken from the corresponding rows of Challenge(yi eft s on ( u ), x). In the 
first case we say that u (x, y )-favors its right son, and in the second case we say that u 
(x, y)-favors its left son. 

Computing the entropy-paths. Let v 0 (x, y), Vi(x, y ),..., i'i og ( n )-i(a;, y) be the root-to- 
leaf path in T x defined by the property that Vi(x,y ) (x, y )-favors Vi + i(x,y) for all i = 
0,1,..., log(n) — 2. Similarly, let uo(x, y), ui(x, y),..., wi og ( n )-i(a;, y) be the root-to-leaf path 
in Ty defined by the property that Ui(x, y) (x, y)-favors Uj + i(x, y) for all i — 0 , 1 ,..., log(n) — 
2. We denote v 0 (x, y),..., i’i og ( n )-i(^, y) by p 0 bseived(x, y) and call this path the observed 
entropy-path ofT x . Similarly, we denote u 0 (x,y ),..., wi og („)-i(x, y) by g 0 bserved(^, y) and call 
this path the observed entropy-path ofT y . 

The computation done in Step 1. Given x,y G {0, l} n , at step 1 the sub-extractor 
computes ^observed (x,y) and y observed (W y)- Clearly, this computation can be done in poly (re¬ 
time. 

Step 2 - Identify v mid (T x ) 

Given x,y,p 0 ^ seTved (x,y), and (/observed(A> y), at the second step the algorithm computes 
l m l id 2rved ( x , y) as follows. 

Setting the node-path challenges. Set £! = £/log 3 n. Let v be a node in T x and let 
p = w 0 ,... ,wi og ( n )_i be a root-to-leaf path in T y . The node-path challenge associated with 
(v,p), denoted by NodePathCh(a; w , y p ), is a log(n) x £! Boolean matrix, dehned as follows. 
For j = 0,..., log(n) - 1, 


NodePathCh (x v ,y p )j = BExt (y Wj ,x v ), 



where BExt is the extractor from Theorem 4.1 set to output t' bits. 

Computing u^d erved (^, y)- We define i>£jj| erved (x, y) to be the node v in p 0 bserved(^, y) with 
the largest depth such that 

Response (x,y, NodePathCh (x v , y qohseived ( x , y ))) = hasEntropy. 

If no such node exists we define v, arbitrarily, as root(Tx). Note that computing uj^| erved (x, y) 
can be done in time poly(n). 

Step 3 — Determine the output 

Given x,y and w^d rved ( a b v) computed in the previous step, he output of the sub-extractor 
is defined by 

SubExt(x,?/) = BExt (^observed OI,J/j , 

where by x.yobserved^) o x we mean the block-source with first block ^observed and second 
block that contains x. Technically, we need to append x^obaerved^^ with zeros so to obtain a 
length n string. Similarly, we append y with n zeros so to obtain a 2 n bit string. 

Recap 

We end this section by recapping the three steps in the computation of the sub-extractor. 
On input x,y G {0, l} n 

1. Compute Pobserved(x,y) and (/observed^, 2/)- 

2 . Compute u^| erved (x, ?/). 

3. Output BExt ^ X^observed^ y ^ O X, . 

8 Analysis of the Construction 

In this section we prove Theorem 1.10 by analyzing the algorithm described in Section 7. 
The proof is done in three steps, following the three steps of the construction. By Fact 6.7, 
we may assume that X has a T^-structure and that Y has a Ty-structure for some entropy- 
trees T x ,T y , both with parameters (k/2, 2 _n W 10 )). This costs only logn in deficiency and 
introduce a small error of 2 _n W 10 fi Throughout the analysis we only consider subsources 
with deficiency o(k 1 ^ 10 ) and so this error can be ignored. 
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8.1 Analysis of Step 1 

We start this section by proving the following claim. 

Claim 8.1. There exist deficiency Hog 2 n subsources X F C X, Y F C Y with the following 
property. For every node v in Tx that is labeled by F, it holds that Challenge((Ap),„, If) 
is fixed to a constant. Further, for every node u of Ty that is labeled by F, it holds that 
Challenge((Yp) u , Ap) is fixed to a constant. 

Proof. Let v be a node in Tx that is labeled by F. Since X has a Tx-structure, X v is fixed to 
a constant, and so Challenge^ 7 *,, Y) is a deterministic function of Y. Since Challenge(A"„, Y) 
consists of £ log n bits, Fact 4.4 implies that there exists a deficiency £ log n-subsource Y' C Y 
such that Challenge(A^, Y') is fixed to a constant. Repeating this argument for every v £ T x 
that is labeled by F, we get a subsource Y F C Y such that Challenge(A"„, l'p) is fixed to a 
constant for every v in T\ that is labeled by F. By the definition of an entropy-tree, there 
is at most one node labeled by F in each level of Tx and since depth (Tx) = logn, we have 
that Yp is a deficiency flog 2 n subsource of Y. 

Since Yp is a subsource of Y, for every node u in T y that is labeled by F it holds that (Y F ) V 
is fixed to a constant. We now perform the analogous process on Ty to obtain a deficiency 
£ log 2 n subsource A"f C X such that for every node u in Ty that is labeled by F it holds that 
Challenge((Yp) u , Ap) is fixed to a constant. Note that since A"p is a subsource of A, it also 
holds that Challenge((Ap)„, Yp) is fixed to a constant for every v in Tx that is labeled by F. 
Thus, informally speaking, by performing the analogous process to Ty we do not “ruin” the 
desired property we obtained first for T x . □ 

Next we show that there exist low-deficiency subsources Api C Ap, Ypi C Yp (FI stands 
for “fixed identified”), restricted to which, SubExt correctly identifies the nodes in Tx,Ty 
that are labeled by F. 

Claim 8.2. There exist deficiency 0(£log 2 n ) subsources X F \ C A"p, Ypi C Yp with the 
following property. For every node v of T x that is labeled by F and for every node u of Ty 
that is labeled by F, it holds that 

Pr [parent(u) (A F i, Y F \)-favors v] = 0, 

Pr [parent(w) (X F \,Y F \)-favors u ] = 0. 

Proof. Let v be a node in Tx that is labeled by F. We first note that by the definition of 
an entropy-tree, root(Tx) cannot be labeled by F, and so it is valid to refer to parent(r>). 
Further, by the definition of an entropy-tree, if a node is labeled by F then it must be the 
left son of its parent. Thus, parent(u) (x, y)-favors v if and only if 

Response (x parent ( v ),y, Challenge (zi e ftSon(parent (v)),y)) = hasEntropy. (8.1) 

By Claim 8.1, Challenge((Ap) t; , Yp) is fixed to a constant. Thus, to apply the challenge- 
response mechanism, we only need to show that both (Ap)p arent( y) and Yp have a sufficient 
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amount of entropy. By the definition of an entropy-tree, since label (r;) = F, label(parent(w)) G 
{H, B top , B mid } and so X parent ( v ^ is 2 _fcn(1) -close to having min-entropy Vt(k l ^ A ). Since A" F is 
a deficiency 0(f'log 2 n) subsource of X , Fact 4.3 implies that (X F ) pare nt(u) is 2 _fcn(I) -close to 
having min-entropy ^(fc 1 / 4 )— 0{t log 2 n) = ^(fc 1 / 4 ). By a similar argument, Yf is 2 fcn< 11 -close 
to having min-entropy Q(k). Since k 1//4 = ta(log 10 n), Theorem 5.1 implies that there exist 
deficiency 2flogn subsources X' C A" F , Y' C 1 F such that for any ( x,y ) G supp((A"', Y')), 
Equation (8.1) fails to hold. Thus, for any such x,y it holds that parent(u) does not ( x,y )- 
favor v. 

We repeat this argument for every node v in Tx that is labeled by F and obtain deficiency 
2flog 2 7i subsources X" C A" F , Y" C Y F with the property that for every v in Tx that is 
labeled by F it holds that 


Pr [parent(u) (X", F")-favors v\ = 0. 

This is possible since the entropy of Y and the entropies of X v for v labeled by one of 
{H, B top , B mid } remain large enough throughout the process. 

We now apply the same argument for every node u in Ty that is labeled by F. Since X" 
and Y" are deficiency 0(0. log 2 n ) subsources of A" F , Yf, respectively, we can obtain deficiency 
0(£l og 2 n ) subsources A" F i C A^ F , Yr C Y F , such that for any node u in Ty that is labeled by 
F it holds that 

Pr [parent(w) (X F |, Y^-favors u] = 0 . 

We note that since X F | and Y F | are subsources of X", Y", it also holds that for every node v 
in T x that is labeled by F, 


Pr [parent(u) (X F |, l F |)-favors v] = 0. 

That is, we have not “ruined” the desired property we obtained first in Tx when working on 
Ty. This concludes the proof of the claim. □ 

Up to this point, we found low-deficiency subsources A" F [ C X and Yr C Y such that the 
nodes labeled by F in Tx,Ty are correctly identified by the challenge-response mechanism 
when applied to samples from X F |, Y FF Next we prove that with high probability over 
(A^ F i, Yf|), the entropy-paths in T x ,T y are identihed correctly by the sub-extractor in the 
sense that the observed entropy-paths contain the entropy-paths of the respective entropy- 
trees. 

Claim 8.3. Except with probability 2 ~ n ^ over (x,y) ~ (X F! , Yr), it holds that 


Vi G { 0 ,.. .,i b ot(T x )} Vi(x,y ) = Vi(T x ), 
Vi G { 0 ,.. .,i bo t(T Y )} Ui(x,y ) = u^Ty). 


Proof. We prove the first equation in the statement of the claim. The proof of the second 
equation is similar, and then the proof of the claim follows by the union bound. We first 
observe that by the definition of an entropy-tree, for any ancestor v ^ fbot(Ty) of Ubot(7x) 
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it holds that label(leftSon(v)) = F if and only if rightSon(r>) is the good son of v. Indeed, 
on one hand, if leftSon(n) is labeled by F then leftSon(n) cannot be an ancestor of ty ot (T x ) 
as all of leftSon(n)’s descendants have no label. On the other hand, since v has a label and 
its label can only be one of H, B top , B mi d, if its left son is not labeled by F then rightSon(n) 
has no label, and so rightSon(n) cannot be an ancestor of v as all of its descendants have no 
label. 

Ideally, given this observation, we would have liked to prove by a backward induction on 
i = ibot(T x ) - 1, • • •, 1,0 that 

Pr [Vj G {i,... ,ibot{T x ) ~ 1} Vj(T x ) (x,y )~favors its good son] > 1 - 2^ n[e) . 

(x,j/)~(X F i,Yfi) 

Indeed, note that the claim will then follow by considering i — 0. However, we need to prove 
a stronger statement so to have a stronger induction hypothesis, as otherwise we will not be 
able to carry the induction step. More precisely, set t = 20£\ogn. Let £j bot (T x )-i = 2 _n W 
For i = ibot(T x ) — 2,..., 1, 0, define = {2~ n ^ + £ i+ i) • poly(n). We prove by a backward 
induction on i = ibot(Tx) — 1, - - -, 1, 0 that for any deficiency it subsources X' C Xpi, 
Y' C Yfi, it holds that 

Pr [Vj G {i,..., ibotiTx) ~ 1} Vj(T x ) (x, y)-favors its good son ] > 1 - e^. 
(x,y)~(X',Y') 

We note that the claim follows by considering i = 0 as e 0 = ■ 2 0 ^ og2n ^ = 2 _n ^. 

We start with the base of the induction i = ibot(T x ) — 1. Let X' C A" F |, Y' C Y Fl 
be deficiency (ibot(T x ) — 1 )t subsources. Consider two cases according to the label of 
leftSon(u ibot(Tx) _i). If label(leftSon(uj bot(Tx) _i)) = F then by Claim 8.2, 

, , P N r v , Nbot(T A -)~i ( x ,y >favors leftSon (^ bot(Tx) -i)] = 0. 

(®,y)~(XFi,y F i) 

Since X',Y' are subsources of X F i, Yfi, respectively, the same holds for (x,y) ~ (X',Y'). 
Moreover, as rightSon(uj bot ( Tx )_ 1 ) is the good son of v ihot (-r x )-i, the basis of the induction for 
this case is proven. 

Consider now the case label(leftSon(u ibot (2’ x )_ 1 )) ^ F. By the observation above, in this 
case, the good son of v ibot (T x )-i is its left son and so leftSon(u ibot (r x )_ 1 ) = Vb 0 t(T x ). Thus, 
v ibot( t x )—i (x, ?/)-favors its good son if and only if 

Response y, Challenge (^ ot( T x ), J/)) = hasEntropy. (8.2) 

Thus, to conclude the proof of the base case, it is enough to show that Equation (8.2) 
holds with probability 1 — 2~ n ^ over (x,y) ~ (X',Y'). To this end, first note that 
Challenge(x„ bot (T x ),7/) contains BExt(x Vhot (T x ), y) as a row. By Theorem 5.1, it is enough 
to show that for all deficiency t subsources X c X', Y C Y 1 , it holds that BExt(Aj, bot (T x ), Y) 
is close to uniform. 

Since X is a deficiency O(ibott + Plog 2 n) = 0(Mog 2 n) subsource of X and since X„ bot ( Tx ) 
is 2 _fcS2(i) -close to an F2(fc 1 / 8 )-block-source, X Vhot ( t x ) is also 2 -fcnii) -close to an H(fc 1/,8 )-block- 
source. Further, since Y is a deficiency 0(£\og 2 n) subsource of Y, and since H 00 (Y) > k, 
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HooiY) = Vt[k). Since k L 8 = o;(log 12 n), Theorem 4.1 implies that BExt(X„ bot ( Tx ), Y) is 
2 -fc n(1) _ c i ose |- 0 a un if orm string on £ bits. Thus, by Theorem 5.1, Equation (8.2) holds with 
probability at least 1 — 2 _n ^. 

We now proceed to the induction step. Let 0 < i < ibot(T x ) ~ 1- Let X' C X F \, Y' C Y F \ 
be deficiency it subsources. We want to show that 

Pr _ [Vj ,ibot(T\) — 1} Vj(T x ) favors its good son] > 1 — e*. 

By the induction hypothesis, for any deficiency (i + l)t subsources X" C X F \, Y" C Yfi, it 
holds that 

Pr [Vj G {i + 1,...,i bo t(T Y ) - 1} Vj(Tx) favors its good son] > 1 - £ i+ i. 

(x,y)~(X" ,Y") 

As was done in the basis of the induction, we consider two cases. If label(leftSon(ry(Tv))) = F 
then by Claim 8.2, 

Pr [' Vi(T x ) (x, y)-favors its good son] = 1. 

(x,y)~{Xf\,Yt\) 

Since X' C X F \ and Y' C Yfi, the same holds for (x,y) ~ (X',Y'). Thus, by the induction 
hypothesis 

Pr [Vj G {i,... ,ibot(T x ) ~ 1} Vj(T x ) favors its good son] > 1 — £ i+ i > 1 - e,. 

(x,v)~(X',Y') 

Consider now the case label(leftSon(nj(Tx))) ^ F. By the observation made at the beginning 
of the proof, in this case the good son of Vi(T x ) is its left son. Thus, Vi(T x ) (x, ?/)-favors its 
good son if and only if 

Response ( x Vi{ r x) ,y , Challenge (xieftSon^fTx)),?/)) = hasEntropy. (8.3) 

By Theorem 5.1, it is enough to show that for any deficiency t subsources X C X', 
Y C Y', it holds that Challenge(A'| e f t s 0 n(„ i (T x )), Y) is 2 -fen(1) -close to having min-entropy L 
Since X is a deficiency t subsource of X ', and since X' is a deficiency it subsource of X F \, 
we have that X is a deficiency (i + 1 )t subsource of X F \. Similarly, Y is a deficiency (i + 1)£ 
subsource of Yfi- Thus, by the induction hypothesis, 

Pr [Vj G {i + 1,..., ibot{T x ) - 1} vj(T x ) favors its good son] > 1 - £ i+1 . 

(x,y)~(X,Y) 

By the above equation and by the definition of Challenge, except with probability e l+ \ over 
(x,y) (X,Y), it holds that BExt {x Vbot (T x ),y) appears as a row in Challenge(x Wi+ 1 ('r x ), y)- 

Since X is a deficiency 0((i + 1)£ + £log 2 n) = 0(£log 2 n) subsource of X and since 
Wbot (t x ) is 2 _fcS!< 11 -dose to an Q(fc 1 ' / 8 )-block-sou rC e, X' Vbot (T x ) is also 2 _fcS!a) -close to an 
fl(fc 1 // 8 )-block-source. Further, since Y is a deficiency ()(£ log 2 n) subsource of Y and since 
iLoo(y) > k, Hr^iY) = Q(k). As we assume k L 8 = cu(log 12 n), Theorem 4.1 implies that 
B Ext (Au bot (Tx ) j Y) is 2 fcn(i) -dose to uniform. Thus, except with probability £ l+ \ + 2 fcn(1, ; 
Challenge(Aj e ftSon(!; i (T x )), Y) has min-entropy i. Thus, by Theorem 5.1, Equation (8.3) holds 
except with probability 1 — ( 2~ n ^ + £ i+ 1 ) • poly(n) = 1 — £j. This concludes the proof of the 
claim. □ 
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8.2 Analysis of Step 2 

Informally speaking, in this section we prove that the sub-extractor correctly identifies 
Vmid(Tx) in some carefully chosen subsources of X F |, Yr. More precisely, we would have 
wanted to prove a statement of the following form: 

A wishful claim. There exist low-deficiency subsources X' C ATi, Y' C Yr such that 
with high probability over (x,y) ~ (X',Y'), v^ rved (x,y) = u m id (T x ). 

Unfortunately, we will not be able to prove this statement. We will, however, be able to 
prove the same statement for X\Y' that have high-deficiency in X F |, Yr. Still, X,Y' will 
have enough entropy and structure so to carry out the rest of the analysis. Furthermore, 
the error term that we are carrying will not cause any harm even after moving to these 
high-deficiency subsources. 

For a G supp((X F |)| eftSon( ,, Tmd(Tx)) ) and ft G supp((y' F |)| eftSon(nmid(Tv)) ), we define 

x a = Xr | (( A"r)| e f t Son(w mid (Tx)) = a )i 
Y/3 = Y ¥ \ | ( (k" F |) leftSon(u mid ('7y)) = /?)■ 

Let B be the set of all (x, y) G supp((A F! , Yn)) such that 

3i G {0,... ,i h ot{T x )} Vi(x,y) vfiTx) V 

3i G {0, ...,?bot(7V)} u i ( x ,y) 7 ^ Ui{T y )■ (8.4) 

By Claim 8.3, 

Pr[(X FI ,y FI )G5]<2- ! 2 W. 

Thus, by averaging, there exist cc, f3 such that 

Pr[(X a ,Yp)eB]<2- n ^. 

These are the subsources X a C A" F |, Yp C Yf\ that we will work with. We think of (x, y) G B 
as an “error” and ignore this event for now. We later accumulate the error coming from 
this event while making sure to treat the error correctly when moving into subsources of 
(X ai Yp). More precisely, recall that by moving to deficiency d subsource, an error of e in 
the source can “grow” to at most 2 d ■ e restricted to the subsource. Since the error term 
is an d since we will move to deficiency o(t')-subsources, the error will remain 

in the subsources that we will restrict to. Thus, we assume that Equation (8.4) holds. In 
particular, we assume that v itop{Tx )(X a ,Yp) = v top (T x ), v iin . d{Tx) (X a ,Yp) = v mid (T x ), etc. 

Recall that v^^ Tved (x,y) is defined to be the node v in p 0 bserved(u y), with the largest 
depth such that 

Response (x, y , NodePathCh (x v , y qohseived ( x , y )) ) = hasEntropy. (8.5) 

Thus, to show that v m id(T x ) is correctly identified on low-deficiency subsources of (X a ,Yp), 
we first show that there exist low-deficiency subsources X a ^ c X a , Y a jj C Yg such that with 
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high probability over (x,y) ~ (X ai p,Y a} p), Equation (8.5) does not hold with v = vfix,y) 
for all i > i m id(Tx)- This is the content of the following claim. Afterwards, in Claim 8.7, 
we show that with high probability over (x,y) ~ (X aj p,Y ai p), Equation (8.5) holds with 
v = v i mid (T x )(x,y) = v mid (T x ). 

Claim 8.4. There exist deficiency 0(£' log 2 n) -subsources X a p c X a , Y a ^ C Yp such that 
with probability 1 — over (x,y) ~ (X a> p,Y at p), it holds that 

Vi > i mid {Tx ) Response (x,y, NodePathCh {x Vi ^ y) , y qobserved ( x , y ))) = fixed. ( 8 . 6 ) 

Towards proving Claim 8.4, we start by proving the following two claims. 

Claim 8.5. There exists a deficiency log n-sub source X' a C X a such that q 0 bserved(X' a ,Yp) is 
fixed to a constant. 

Proof. Recall that (/observed (AA, Yp) is the path 
uo(X a , Yp) 

) • ■ ■ ) U ibot (Ty)(X a , Yp), Ui hot (T Y )+l {X a , Yp), . . . , U\ og ( n )-i(X a , Yp). 

By Equation (8.4), for i < i bo t(2V) it holds that ufiX a ,Yp) = ufiTy), and so for such i, 
Ui(X a ,Yp) is fixed to a constant. We now consider the case i > ibot(Ty). Consider first 
the case i = i\ )0 t (Ty) + 1. In this case, ufiX a ,Yp) determines which of the two sons of 
u i hot (r Y ){x,y) = Ubot(7V) is on the observed entropy-path (/observed {X a , Yp). Recall that this 
decision is based on whether or not 

Response {{Yp) Uhoti r Y) ,X a , Challenge ((h»i e ftSon( Ubot (Ty)),^a)) = hasEntropy. (8.7) 

Since u^ ot {T Y ) and leftSon(n bo t(Ty)) are descendants of leftSon(n mi d(7y)), as follows by the 
definition of an entropy-tree, and since (R/ 3 )i e ftSon(u mid (Tv)) is fixed to (5, it also holds that 
(Tg)«bot(TV) an d ( Yp )ieftSon(u bot (Ty)) are fixed to constants. Thus, Equation (8.7) is determined 
only by X a . Since Equation (8.7) gives one bit of information on X a , by Fact 4.4, there 
exists a deficiency 1 subsource X' C X a such that ufiX', Yp) is fixed to a constant. 

We now repeat this argument for i = i ho t (Ty) + 2,..., log(n) — 1. In each iteration we 
make sure that the next descendant of U] mt (T Y ) is fixed to a constant on a low-deficiency 
subsource of X a with Yp. Since we repeat this process for at most log n iterations, we will 
eventually obtain a deficiency log n-subsource X' a C X a such that ^observed (X' a , Yp) is fixed to 
a constant, as desired. Accounting back for the error, note that since £ = o;(logn), it holds 
that Pr [{X' a ,Yp) e B] < 2~ n W. □ 

Claim 8.5 allows us to set the ground for the challenge-response mechanism: 

Claim 8.6. For any i > i m id(Tx), 

NodePathCh ({X' a ) Vi{X ' a ,Yp), ( Y t 3 )q observed (x^)) 

is a deterministic function ofYp. 
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Proof. By Claim 8.5 we have that (/observed (A^, Yp) is fixed to a constant. Thus, it suffices to 
show that (X' a ) Vi (x> a ,Yp) is a deterministic function of Yp. 

We start by considering the case i = i m id(Tx) + 1. In this case, VifXf, Yp) is fixed to 
a constant. Indeed, since i = i m id(T x ) + 1 < ibat{T x ), it holds by Equation (8.4) that 
Vi(X a , Yp) = Vi(T x ) and so, since X' a is a subsource of X a , Vi(X’ a ,Yp) = Vi(T x ). 

The case i > i m id(T x ) + 1 follows by a different logic, similar to that used in the proof 
of Claim 8.5. Lets first consider the case i = i m id(T x ) + 2. Recall that (AA)i e ftSon(t> mid (7x)) is 
fixed to a constant. Thus, also (A^)| e f t s on (?; mid (T x )) is fixed to a constant. Now, Vi(X' a ,Yp) is 
defined to be one of the two sons of leftSon(u m i d (Tx)) according to the Boolean value of the 
expression 

Response ((A a )| e f t s on ( 1)m . d (j' x p, Yp, Challenge ((A a )| e f t son(ieftSon(i> mid (Tx)))j Yp)) = hssEntropy. 

Since (A^)| e f t s 0 n(^ mid (T Y )) is fixed to a constant, the above equation is determined only by Yp. 
This shows that Vi(X' a ,Yp) is a deterministic function of Yp for i = i mid (Tx) + 2. A similar 
argument can be used to show that the same holds for any i > i m id(T x ) + 1. 

To conclude the proof of the claim, we need to show that (A ' a ) Vi ( X ' a ,Yi 3 ) i s a deterministic 
function of Yp for i > i m \d{Tx)- Recall that for any such i, Vi(X' a ,Yp) is a descendant of 
leftSon(n mid (Tv)) determined only by Yp. The claim then follows as (A"(J| e f t s 0 np> mid (T Y )) is 
fixed to a constant. □ 

We are now ready to prove Claim 8.4. 

Proof of Claim 8.f. By Claim 8.6, 

NodePathCh((A^)„.(x 4 ,y 3 ), {Yp)q ohserved (x' a ,Yp)) 

is a deterministic function of Yp for all i > i r n\d(T x )- Thus, there exists a deficiency 
0[P log 2 n) = o(£) subsource Yp C Yp such that for all i > imid(Tx), 

NodePathCh((X;)„, (x ,, y . ) . (Yp 

^observed (-^a ^ 

is fixed to a constant. By Theorem 5.1, and accounting for the error, there exist deficiency 
2£' log 2 n = o(£) subsources X a ,p C X' a , Yf p C Yp, such that with probability 1 — 2over 
(x,y) rs-/ (A aj p, Y ay p) Equation (8.6) holds. 

We note that this application of Theorem 5.1 is valid as both X a> p, Y at p are 2 _fcn<i) - 
close to having min-entropy fl(log 10 n). Indeed, X aj p is a deficiency 0{£ l log 2 n)-subsource 
of X a = Xf\ | ((Api) ieftSon(w mid (r_ Y )) = a )- Since Afi is a deficiency 0(t'log 2 n)-subsource of 
X and since X Vmid (T x ) is 2 _fcna, -close to an fl(fc 1/,4 )-block-source, it holds that (AAi)„ mid (T x ) 
is also 2 -fcn(1, -close to an r2(/c 1,/4 )-block-source. Thus, X a is 2 _fcn(1 ’-close to having min- 
entropy Ct{k 1 ^). Therefore, X a ,p is 2~ fcn(1) -close to having min-entropy h2(log 10 n). A similar 
argument can be used for Y a ,P- □ 

Claim 8.7. With probability 1 — 2 _n ^ over ( x,y ) ~ (A a ,p,Y aj p), it holds that 

Response (x,y, NodePathCh . d(Tjf ,(*,y), )) = hasEntropy. (8.8) 
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Proof. Let B be the event defined in Equation (8.4). As usual, we consider (x,y) G B as 
an “error” and ignore it for now. In particular, Vi mid (r x )(x,y) = t’ m id (Tx ) and the path 

^observed(a;,?/) is assumed to contain u top (T Y ). Thus, NodePathCh (xv imid(Tx) (x, y ), Observed (*,y)) 
contains BExt (y Utop (T Y ),x Vmid{Tx )) as a row. 

Recall that {Y') u .(t y ) is 2 _r2 ( fcl/1 °i-close to an fI(fc 1 / 2 )-block-source. Since Y F \ is a defi¬ 
ciency 0(£log 2 n)-subsource of Y, and since £log 2 n = o(fc 1//10 ), (YFi)u t o P (7V) is also 2 _n ( fcl/10 )_ 
close to an J2 (A; 1//2 )-block-source. By a similar argument, (TFi)ti mid {t y ) is 2 _Q ( fel/1 °i-close to 
an n(fc 1//4 )-block-source. Thus, (Ta) Umid (T y ) is 2 _n ^ 1/10 i-close to having min-entropy Pt[k 1 ^), 
and so (Yp)ut op {T Y ) is 2~ s T fcl/1 °i-close to an fI(fc 1 / 4 )-block-source. Therefore, iXa,y) Utop {T Y ) 
2 _n ( fel/10 )-cl OSe to an f2(A; 1//4 )-block-source. 

Recall that X Vmjd (r x ) is 2 _n ^ 1/ll,) -close to an f2(/c 1/,4 )-block-source. Since X F \ is a defi¬ 
ciency 0(Alog 2 n)-subsource of X, (X F \ )„mi d (T x ) is 2 n ( fcl/10 )- c lose to an f2(fc 4 / 4 )-block -source. 
Thus, (X a ) Vmid (r x ) is 2 _n(fel 10 '-close to having min-entropy Pt(k 1/A ). Therefore, (X a ^) Vmid{Tx ) 
is 2 _r2 ( fel/10 Lclose to having min-entropy f2(fc 1/,4 ). 

Let X C X aj p, Y C Y a j-j be any deficiency 201" log n subsources. We have that Y Utop (r Y ) is 
2~ n ( fcl/1 °)-close to an fI(fc 1 / 4 )-block-source and that X Vmid (r x ) is 2 _r2 ( fel/1 °)-close to having min- 
entropy f2(fc 4 / 4 ). Thus, BExt(K Utop ( Ty ), Ai Umid (T x )) i s 2 _fcS2<11 -close to a uniform string on f bits. 
In particular, NodePathCh (X Vini d (T x ), ^ obse rved(AL)) i s 2~ fcJ!< 11 -close to having min-entropy P. 

Theorem 5.1 then implies that Equation (8.8) holds with probability at least 1 — (2~ £ ' + 
2- fe ° ll> ) = 1 — 2 ~ n T) over (x,y) ~ (X a ^,Y a ^). This concludes the proof of the claim. 

□ 

8.3 Analysis of Step 3 

Recall that the output of the sub-extractor is defined as 

SubExt (x,y) = BExt ^ X observed ^ y ^ O X, |/j . 

By Claim 8.4 and Claim 8.7, we have that except with probability 2 ~ n T) over (x,y) ~ 
{X a ,ih Y a jj) it holds that v^ff Tved (x : y) = u m id (Tx)- Recall that v m id{Tx) is a descen¬ 
dant of leftSon(u top (Tx)). Further, recall that (Xa,p)v mid (T x ) is 2 _fcf2(1) -close to having min- 
entropy Pl[k l ^ A ). Since X vtop (r x ) is 2~ s T fcl/1 °i-close to a E2(/c 1//2 )-block-source, we have that 
(X a ,p)vt op (t x ) i s 2 _n ^ fcl/1() ficlose to an f2(/c 1//4 )-block-source. In particular, this implies that 
{X a .y) VjrM {T x ) ° X a! p is 2 _fcn(1) -close to an fl(£; 1/4 )-block -source. 

Recall also that Y a p is 2 _n ( fel/10 Lclose to having min-entropy Pl(k). Thus, by Theorem 4.1 
we conclude that 

SubExt(A 7 " Q ,, / 3, Ya,p) — BExt ([X aj p) v ^Tved^ Xa ^ )Ya< p) ° X a ,p, Y a ,^j 

is 2 _fcS2< 11 -close to uniform. 
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9 Conclusion and Open Problems 

The next quantitative natural goal. In this paper, we gave a construction of a 2 poly ( loglogn )- 
Ramsey graph, or equivalently, a two-source disperser for entropy polylog(n). Erdos set the 
goal at constructing 0(log n)-Ramsey graphs, which translates to the difficult problem of 
constructing two-source dispersers for entropy log(n) + 0(1). We set the next goal towards 
Erdos challenge at constructing a polylog(n) = 2 o ( loglog 0-Ramsey graph, which is equivalent 
to a two-source disperser for entropy O(logn). 

A weakly-explicit construction. Our construction of Ramsey graphs is strongly-explicit, 
namely, one can query each pair of vertices of the n-vertices graph to check whether there 
is an edge connecting them, in time polylog(n). In the setting of two-source dispersers, a 
strongly-explicit construction is the natural definition. However, we believe it is interesting 
to obtain better weakly-explicit Ramsey graphs, where by weakly-explicit we mean that the 
entire graph can be computed in time poly(n). Barak et al. [BKS + 10] have a simple con¬ 
struction of a polylog(n)-Ramsey graph, however, its running-time is 2 polylogn . Other than 
that, we are not aware of any result in this direction. 

Improved sub-extractors. The two-source sub-extractor that we construct has inner- 
entropy fc” ( t 1} or even A; out /polylog(n), where A; out is the outer-entropy. We pose the problem 
of constructing a sub-extractor with inner-entropy k in = Q(k ont ) or even k in = k OVLt — o(k OVLt ), 
for k out = polylog (n). We believe that this is a natural goal towards constructing two-source 
extractors for polylogarithmic entropy. 

Affine dispersers. Shaltiel [Shall] adjusted the challenge-response mechanism so to work 
with a single affine-source, rather than with two weak-sources. This allowed him to construct 
affine-dispersers for entropy 2^ logn ) . Can one obtain affine-dispersers for polylogarithmic 
entropy given recent advances? 
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